AI Evaluation for AI SDR

1. Installing FutureAGI

pip install ai-evaluation

2. Loading Dataset

Dataset used here contains value proposition and Linkedin posts, using which the AI models will create openers as per the prompts.

import pandas as pd

dataset = pd.read_csv("data.csv")
pd.set_option('display.max_colwidth', None)

Below the sample of dataset used in this cookbook:

value_proposition:
Get location information of your social media following to place better ads and sponsorships

combined_posts:	
**Post 1:**\n\nIn the past 12 months, my LinkedIn following went from 36k to 58k. But followers won't buy you a snickers bar. Here's the actual value of that brand growth for Apollo.io:\n\n- Generated nearly 30MM impressions\n- 37 inbound demo requests (direct DMs asking to learn more about Apollo 28 of which were qualified T1-T3 opportunities)\n- Spoke on 14 podcasts\n- Contributed to 3 sales blogs\n- Drove a bunch of free user signups\n\n\n^^^ This ALL happened passively just doing my job as a marketer. \n\nImagine if I had a quota and tried to strategically turn this into a funnel?\n\nWell, I used to as a BDR!\n\nOn Wednesday, James A. O'Sullivan and I are breaking down how I leveraged my LinkedIn presence to intentionally build a 7-figure pipeline in under a year. \n\n\n\n\nNo gatekeeping. All my tips and tricks to help you get started- for FREE. 😎\n\nBe there or be square. (l*nk in comments)\n\n\nPs. A few folks who show up will win a profile audit from me so you should def register:)))\n\nPps. ♻️ Repost to let a sales pal know this is happening!\n\n**Post 2:**\n\nPro-tip that booked me 4-5 meetings from my top accounts per quarter. Steal it: (Or don't... I do not care) 💁🏻‍♀️\n\nI would hit up an executive peer and run a sequence thru them. \n\nBACKGROUND\n\nExecs like to talk to other execs. They don't always want to reply to an SDR. \n\nI'd run a little sequence partnering with my VP of Sales or CRO to connect with, Email + follow up DM my prospects. \n\nTHREE things you need! \n\n1. Copy for them to send from their LinkedIn + instructions on who/when to send those connections.\n\n2. An email alias as them within your own SEP (you can do this in Apollo.io if you need one)\n\n3. Your exec on board :) Do not impersonate them\n\n\nTHE PLAYBOOK\n\nHere is how I would run this sequence today if I was an SDR or AE at Apollo trying to book a meeting with 15Five:\n\n1. Draft a connection note to my top 5 contacts at 15Five from Leandra Fishman to connect. (Ask her to send out those connections)\n\n2. Create an email Alias as Leandra in my Apollo instance and write a 3 step sequence (2 emails+ 1 LinkedIn DM post connection acceptance)\n\n3. Run the sequence as Leandra - with her Bcc'd on sends + replies (this is LOW volume but high-value accounts so it should not inundate your execs)\n\n4. When we get a reply collaborate with Leandra directly to schedule a call and have her facilitate the handoff to me, the rep. \n\n5. Keep Leandra CC'd on the email thread as the deal progresses.\n\n\n\n\n\nNOTE: This should ONLY be used to top accounts. This is NOT a method that works with high volume/spray and pray strategies. To keep it authentic keep your exec looped in. \n\nBonus- work gifting into your strategy:) Exec to exec gifting is neat:)\n\n\n\n\n\n\n\nTry it!\n\nYou won't try it.....\n\n\n\n\n♻️ Repost for a sales pal in need of some MASSIVE meetings this Q:)\n\n**Post 3:**\n\nMental health will always be a core pillar of my content. If that's not your thing, all good- feel free to scroll past those posts or unfollow. It's all love.\n\nBut if you think I need to "stop writing about it" because you're worried it will hurt my career?\n\nCheck your bias. \n\nMessages like this don't communicate to me that companies will judge me. They communicate that YOU are judging me and others like me for what is actually a widely experienced and woefully stigmatized struggle.\n\nWe are all just human beings, being human. There is room for that in the workplace. \n\n\n\n\nAnd to any brands, companies, leaders, and future employers who take pause knowing that I am someone who speaks about, advocates for, and struggles with- mental health... I will save you some time. \n\nWe are NOT a good fit. 💁🏻‍♀️\n\nAnd that is okay. :)\n\n\n\n\nPs. Please be kind in the comments. This person isn't evil, just misguided. We don't change perception by piling on hate. We change it with compassion and vulnerability. \n\nSo imma' keep doing what I am doing. \n\nBack to your regularly scheduled SDR tips tomorrow <3

prompt_1:	
You have been given 3 LinkedIn posts written by the same person. You work for a company which offers the following value to their prospects:\n\n**Value Proposition: Get location information of your social media following to place better ads and sponsorships**\n\nTake a deep breath, clear your mind and from the given posts first select the post most relevant to your value proposition. The entire post could be related to the value proposition or there could be a small portion in the post that might be relevant. \n\nAfter having found the most relevant post, write a **single sentence** opener for an outreach message referencing the post. Summarize the content of the post briefly to make a catchy opener. The email should start with "I recently saw your post about" and summarize the content briefly.\n\n**Posts:**\n**Post 1:**\n\nIn the past 12 months, my LinkedIn following went from 36k to 58k. But followers won't buy you a snickers bar. Here's the actual value of that brand growth for Apollo.io:\n\n- Generated nearly 30MM impressions\n- 37 inbound demo requests (direct DMs asking to learn more about Apollo 28 of which were qualified T1-T3 opportunities)\n- Spoke on 14 podcasts\n- Contributed to 3 sales blogs\n- Drove a bunch of free user signups\n\n\n^^^ This ALL happened passively just doing my job as a marketer. \n\nImagine if I had a quota and tried to strategically turn this into a funnel?\n\nWell, I used to as a BDR!\n\nOn Wednesday, James A. O'Sullivan and I are breaking down how I leveraged my LinkedIn presence to intentionally build a 7-figure pipeline in under a year. \n\n\n\n\nNo gatekeeping. All my tips and tricks to help you get started- for FREE. 😎\n\nBe there or be square. (l*nk in comments)\n\n\nPs. A few folks who show up will win a profile audit from me so you should def register:)))\n\nPps. ♻️ Repost to let a sales pal know this is happening!\n\n**Post 2:**\n\nPro-tip that booked me 4-5 meetings from my top accounts per quarter. Steal it: (Or don't... I do not care) 💁🏻‍♀️\n\nI would hit up an executive peer and run a sequence thru them. \n\nBACKGROUND\n\nExecs like to talk to other execs. They don't always want to reply to an SDR. \n\nI'd run a little sequence partnering with my VP of Sales or CRO to connect with, Email + follow up DM my prospects. \n\nTHREE things you need! \n\n1. Copy for them to send from their LinkedIn + instructions on who/when to send those connections.\n\n2. An email alias as them within your own SEP (you can do this in Apollo.io if you need one)\n\n3. Your exec on board :) Do not impersonate them\n\n\nTHE PLAYBOOK\n\nHere is how I would run this sequence today if I was an SDR or AE at Apollo trying to book a meeting with 15Five:\n\n1. Draft a connection note to my top 5 contacts at 15Five from Leandra Fishman to connect. (Ask her to send out those connections)\n\n2. Create an email Alias as Leandra in my Apollo instance and write a 3 step sequence (2 emails+ 1 LinkedIn DM post connection acceptance)\n\n3. Run the sequence as Leandra - with her Bcc'd on sends + replies (this is LOW volume but high-value accounts so it should not inundate your execs)\n\n4. When we get a reply collaborate with Leandra directly to schedule a call and have her facilitate the handoff to me, the rep. \n\n5. Keep Leandra CC'd on the email thread as the deal progresses.\n\n\n\n\n\nNOTE: This should ONLY be used to top accounts. This is NOT a method that works with high volume/spray and pray strategies. To keep it authentic keep your exec looped in. \n\nBonus- work gifting into your strategy:) Exec to exec gifting is neat:)\n\n\n\n\n\n\n\nTry it!\n\nYou won't try it.....\n\n\n\n\n♻️ Repost for a sales pal in need of some MASSIVE meetings this Q:)\n\n**Post 3:**\n\nMental health will always be a core pillar of my content. If that's not your thing, all good- feel free to scroll past those posts or unfollow. It's all love.\n\nBut if you think I need to "stop writing about it" because you're worried it will hurt my career?\n\nCheck your bias. \n\nMessages like this don't communicate to me that companies will judge me. They communicate that YOU are judging me and others like me for what is actually a widely experienced and woefully stigmatized struggle.\n\nWe are all just human beings, being human. There is room for that in the workplace. \n\n\n\n\nAnd to any brands, companies, leaders, and future employers who take pause knowing that I am someone who speaks about, advocates for, and struggles with- mental health... I will save you some time. \n\nWe are NOT a good fit. 💁🏻‍♀️\n\nAnd that is okay. :)\n\n\n\n\nPs. Please be kind in the comments. This person isn't evil, just misguided. We don't change perception by piling on hate. We change it with compassion and vulnerability. \n\nSo imma' keep doing what I am doing. \n\nBack to your regularly scheduled SDR tips tomorrow <3\n \n	

opener_1:
I recently saw your post about leveraging LinkedIn for building a pipeline; location insights could enhance your ad strategies even further!	

prompt_2:	
You are a skilled sales development representative tasked with crafting personalized email openers based on LinkedIn posts. Your goal is to create a compelling, one-sentence opener that resonates with the prospect and relates to your company's value proposition.\n\nCompany Value Proposition: Get location information of your social media following to place better ads and sponsorships\n\nGiven: Three recent LinkedIn posts by the same person.\n\nInstructions:\n1. Carefully read and analyze all three posts.\n2. Identify the post most relevant to your company's value proposition. This relevance may be found in the entire post or a specific section.\n3. Craft a single-sentence opener that:\na) Begins with "I recently saw your post about"\nb) Briefly summarizes the key point or insight from the chosen post\nc) Subtly connects to your company's value proposition without explicitly mentioning it\nd) Uses a tone that matches the prospect's writing style\ne) Demonstrates genuine interest and insight\n\n4. Ensure your opener is engaging, concise, and natural-sounding.\n\nPosts:\n\n\nPost 1.\n```\nIn the past 12 months, my LinkedIn following went from 36k to 58k. But followers won't buy you a snickers bar. Here's the actual value of that brand growth for Apollo.io:\n\n- Generated nearly 30MM impressions\n- 37 inbound demo requests (direct DMs asking to learn more about Apollo 28 of which were qualified T1-T3 opportunities)\n- Spoke on 14 podcasts\n- Contributed to 3 sales blogs\n- Drove a bunch of free user signups\n\n\n^^^ This ALL happened passively just doing my job as a marketer. \n\nImagine if I had a quota and tried to strategically turn this into a funnel?\n\nWell, I used to as a BDR!\n\nOn Wednesday, James A. O'Sullivan and I are breaking down how I leveraged my LinkedIn presence to intentionally build a 7-figure pipeline in under a year. \n\n\n\n\nNo gatekeeping. All my tips and tricks to help you get started- for FREE. 😎\n\nBe there or be square. (l*nk in comments)\n\n\nPs. A few folks who show up will win a profile audit from me so you should def register:)))\n\nPps. ♻️ Repost to let a sales pal know this is happening!\n```\n\nPost 2.\n```\nPro-tip that booked me 4-5 meetings from my top accounts per quarter. Steal it: (Or don't... I do not care) 💁🏻‍♀️\n\nI would hit up an executive peer and run a sequence thru them. \n\nBACKGROUND\n\nExecs like to talk to other execs. They don't always want to reply to an SDR. \n\nI'd run a little sequence partnering with my VP of Sales or CRO to connect with, Email + follow up DM my prospects. \n\nTHREE things you need! \n\n1. Copy for them to send from their LinkedIn + instructions on who/when to send those connections.\n\n2. An email alias as them within your own SEP (you can do this in Apollo.io if you need one)\n\n3. Your exec on board :) Do not impersonate them\n\n\nTHE PLAYBOOK\n\nHere is how I would run this sequence today if I was an SDR or AE at Apollo trying to book a meeting with 15Five:\n\n1. Draft a connection note to my top 5 contacts at 15Five from Leandra Fishman to connect. (Ask her to send out those connections)\n\n2. Create an email Alias as Leandra in my Apollo instance and write a 3 step sequence (2 emails+ 1 LinkedIn DM post connection acceptance)\n\n3. Run the sequence as Leandra - with her Bcc'd on sends + replies (this is LOW volume but high-value accounts so it should not inundate your execs)\n\n4. When we get a reply collaborate with Leandra directly to schedule a call and have her facilitate the handoff to me, the rep. \n\n5. Keep Leandra CC'd on the email thread as the deal progresses.\n\n\n\n\n\nNOTE: This should ONLY be used to top accounts. This is NOT a method that works with high volume/spray and pray strategies. To keep it authentic keep your exec looped in. \n\nBonus- work gifting into your strategy:) Exec to exec gifting is neat:)\n\n\n\n\n\n\n\nTry it!\n\nYou won't try it.....\n\n\n\n\n♻️ Repost for a sales pal in need of some MASSIVE meetings this Q:)\n```\n\nPost 3.\n```\nMental health will always be a core pillar of my content. If that's not your thing, all good- feel free to scroll past those posts or unfollow. It's all love.\n\nBut if you think I need to "stop writing about it" because you're worried it will hurt my career?\n\nCheck your bias. \n\nMessages like this don't communicate to me that companies will judge me. They communicate that YOU are judging me and others like me for what is actually a widely experienced and woefully stigmatized struggle.\n\nWe are all just human beings, being human. There is room for that in the workplace. \n\n\n\n\nAnd to any brands, companies, leaders, and future employers who take pause knowing that I am someone who speaks about, advocates for, and struggles with- mental health... I will save you some time. \n\nWe are NOT a good fit. 💁🏻‍♀️\n\nAnd that is okay. :)\n\n\n\n\nPs. Please be kind in the comments. This person isn't evil, just misguided. We don't change perception by piling on hate. We change it with compassion and vulnerability. \n\nSo imma' keep doing what I am doing. \n\nBack to your regularly scheduled SDR tips tomorrow <3\n```\n\n\nOutput: Provide only the single-sentence opener, without any additional explanation or commentary.\n	

opener_2:
I recently saw your post about leveraging your LinkedIn presence to build a pipeline, which aligns perfectly with optimizing audience targeting.

3. Initialising Future AGI’s Evaluator Client

from fi.evals import Evaluator

evaluator = Evaluator(fi_api_key="<your_api_key>", 
                  fi_secret_key="<your_api_secret>")

4. Defining Custom Deterministic Eval

Definig custom deterministic eval that is tailored to our use case. With below config:

Property	Description
Eval Name	custom_deterministic_eval
Langugage Model	Turing Flash
Rule Prompt	Given opener : {} , combined_posts : {}, value_proposition: {}. Given the combined_posts and value_proposition, {}
Deterministic Choices	Good, Poor
Multi-choice	False

Click here to learn how to create your own custom eval.

5. Defining Judging Criteria for Evaluating AI Generated Openers

Here, the AI generated opener is being judged on following criteries:
- Engagement
- Tone
- Relevance
- Appropriateness
- Impact
You can include more criterias that suits your use-case, given that you explicitily define on how to choose the tags.
Tags are nothing but the output deterministic eval returns. Depending on the use-case, you can choose multi-choice or single-choice.
You can add any number of tags, given that you have defined on how to choose those tags.

JUDGING_CRITERIA = {
    "Engagement": "Evaluate whether the opener captures attention and encourages interaction or further thought. Choose Good if the opener is engaging, sparks curiosity, or creates a sense of interest, making the reader want to engage further. Choose Poor if the opener feels generic, uninspiring, or fails to prompt any interaction or interest.",
    "Tone": "Evaluate whether the tone of the opener is respectful, professional, and avoids being patronizing or condescending. Choose Good if the tone matches the context, feels approachable, and conveys professionalism without being overly casual or rigid. Choose Poor if the tone is overly formal, dismissive, condescending, or inappropriate for the intended audience.",
    "Relevance": "Evaluate whether the opener is relevant to the combined posts. Choose Good if the opener aligns closely with the topic, addresses the subject matter accurately, and stays on-point. Choose Poor if the opener feels disconnected, includes irrelevant information, or strays from the primary focus of the combined posts.",
    "Appropriateness": "Evaluate whether the correct post from the combined posts was selected to create the opener. Choose Good if the selected post clearly supports the value proposition and fits well with the purpose of the opener. Choose Poor if the selection feels irrelevant, random, or poorly suited to the context or value proposition.",
    "Impact": "Evaluate how compelling and effective the opener is in delivering its message. Choose Good if the opener leaves a strong impression, effectively conveys its value proposition, and makes the reader want to engage further. Choose Poor if the opener feels weak, ineffective, or fails to make a memorable or persuasive impact."
}

6. Evaluating AI Generated Openers Using Custom Deterministic Eval

Below code will create test case for each judging criteria using custom deterministic eval. Since we are using f-string in the “opener” and it requires input keys inside double curly braces, so include them inside 4 curly braces. Otherwise the eval would not recieve these inputs to perform correct evaluation.

complete_result = {}
for criterion, description in JUDGING_CRITERIA.items():

  results_1 = []
  for index, row in dataset.iterrows():
    test_case_1 = evaluator.evaluate(
        eval_templates="custom_deterministic_eval",
        inputs={
            "opener": row['opener_1'],
            "combined_posts": row['combined_posts'],
            "value_proposition": row['value_proposition']
        },
        model_name="turing_flash"
    )
    result_1 = evaluator.evaluate(
        eval_templates="custom_deterministic_eval",
        inputs={
            "opener": row['opener_1'],
            "combined_posts": row['combined_posts'],
            "value_proposition": row['value_proposition']
        },
        model_name="turing_flash"
    )
    option_1 = result_1.eval_results[0].metrics[0].value
    results_1.append(option_1)

  results_2 = []
  for index, row in dataset.iterrows():
    test_case_2 = evaluator.evaluate(
        eval_templates="custom_deterministic_eval",
        inputs={
            "opener": row['opener_2'],
            "combined_posts": row['combined_posts'],
            "value_proposition": row['value_proposition']
        },
        model_name="turing_flash"
    )
    result_2 = evaluator.evaluate(
        eval_templates="custom_deterministic_eval",
        inputs={
            "opener": row['opener_2'],
            "combined_posts": row['combined_posts'],
            "value_proposition": row['value_proposition']
        },
        model_name="turing_flash"
    )
    option_2 = result_2.eval_results[0].metrics[0].value
    results_2.append(option_2)

  complete_result[f"{criterion} Eval Rating 1"] = results_1
  complete_result[f"{criterion} Eval Rating 2"] = results_2

complete_result_df = pd.DataFrame(complete_result)

from tabulate import tabulate

complete_result_prompt1 = complete_result_df.iloc[:, ::2]
complete_result_prompt2 = complete_result_df.iloc[:, 1::2]

print("\nEvaluation on Prompt 1")
print(tabulate(complete_result_prompt1, headers='keys', tablefmt='fancy_grid', showindex=False))

print("\nEvaluation on Prompt 2")
print(tabulate(complete_result_prompt2, headers='keys', tablefmt='fancy_grid',showindex=False))

Output:

Evaluation on Prompt 1

Engagement Evaluation Result 1	Tone Evaluation Result 1	Relevance Evaluation Result 1	Appropriateness Evaluation Result 1	Impact Evaluation Result 1
Good	Good	Good	Poor	Good
Good	Good	Good	Good	Good
Good	Good	Good	Poor	Good
Good	Good	Good	Good	Good

Evaluation on Prompt 2

Engagement Evaluation Result 2	Tone Evaluation Result 2	Relevance Evaluation Result 2	Appropriateness Evaluation Result 2	Impact Evaluation Result 2
Poor	Good	Good	Good	Good
Good	Good	Good	Good	Good
Good	Good	Poor	Poor	Poor
Good	Good	Good	Good	Good

7. Selecting Winner Prompt

For our use-case, that prompt is considered as a winner prompt that performs better on these judging criterias. Performance of a prompt can be judged by taking the majority of positve tags, here “Good” across all column per row. Then both the prompts are compared, and whichever has more number of “Good” prompts will be considered as a winner prompt.

def get_majority(row):
    frequency = row[:5].value_counts()
    majority = frequency.idxmax()
    return majority

df1_majority = complete_result_prompt1.apply(get_majority, axis=1)
df2_majority = complete_result_prompt2.apply(get_majority, axis=1)

df1 = pd.DataFrame({'Eval Rating Prompt 1': df1_majority})
df2 = pd.DataFrame({'Eval Rating Prompt 2': df2_majority})

df_combined = pd.concat([df1, df2], axis=1)

print("\nEval Rating")
print(tabulate(df_combined, headers='keys', tablefmt='fancy_grid', showindex=False))

good_count_prompt1 = (df1_majority == "Good").sum()
good_count_prompt2 = (df2_majority == "Good").sum()

if good_count_prompt1 > good_count_prompt2:
    winner = "Prompt 1"
elif good_count_prompt2 > good_count_prompt1:
    winner = "Prompt 2"
else:
    winner = "TIE"

print(f"\nWinner Prompt: {winner}")

Output:

Eval Rating Prompt 1	Eval Rating Prompt 2
Good	Good
Good	Good
Good	Good
Good	Poor
Good	Good
Good	Good
Good	Good
Good	Good
Good	Good
Good	Good

Winner Prompt: Prompt 1

Cookbooks

​1. Installing FutureAGI

​2. Loading Dataset

​3. Initialising Future AGI’s Evaluator Client

​4. Defining Custom Deterministic Eval

​5. Defining Judging Criteria for Evaluating AI Generated Openers

​6. Evaluating AI Generated Openers Using Custom Deterministic Eval

​Evaluation on Prompt 1

​Evaluation on Prompt 2

​7. Selecting Winner Prompt

1. Installing FutureAGI

2. Loading Dataset

3. Initialising Future AGI’s Evaluator Client

4. Defining Custom Deterministic Eval

5. Defining Judging Criteria for Evaluating AI Generated Openers

6. Evaluating AI Generated Openers Using Custom Deterministic Eval

Evaluation on Prompt 1

Evaluation on Prompt 2

7. Selecting Winner Prompt