Evaluate Function Calling
Evaluates the accuracy and effectiveness of function calls made by LLM. It checks whether the output correctly identifies the need for a tool call and whether it accurately includes the tool with the appropriate parameters extracted from the input.
result = evaluator.evaluate(
eval_templates="evaluate_function_calling",
inputs={
"input": "Get the weather for London",
"output": '{"function": "get_weather", "parameters": {"city": "London", "country": "UK"}}'
},
model_name="turing_flash"
)
print(result.eval_results[0].output)
print(result.eval_results[0].reason)import { Evaluator, Templates } from "@future-agi/ai-evaluation";
const evaluator = new Evaluator();
const result = await evaluator.evaluate(
"evaluate_function_calling",
{
input: "Get the weather for London",
output: '{"function": "get_weather", "parameters": {"city": "London", "country": "UK"}}'
},
{
modelName: "turing_flash",
}
);
console.log(result); | Input | |||
|---|---|---|---|
| Required Input | Type | Description | |
input | string | input provided to the LLM that triggers the function call. | |
output | string | LLM’s output that has the resulting function call or response. |
| Output | ||
|---|---|---|
| Field | Description | |
| Result | Returns Passed if the LLM correctly identified that a function/tool call was necessary, or Failed if the LLM did not correctly handle the function call requirement. | |
| Reason | Provides a detailed explanation of the function calling evaluation. |
What to Do When Function Calling Evaluation Fails
Examine the output to identify whether the failure was due to missing function call identification or incorrect parameter extraction. If the output did not recognise the need for a function call, review the input to ensure that the function’s necessity was clearly communicated. If the parameters were incorrect or incomplete.
Refining the model’s output or adjusting the function call handling process can help improve accuracy in future evaluations.
Comparing Evaluate Function Calling with Similar Evals
- Task Completion: Evaluate Function Calling assesses whether the LLM correctly identifies and formats a function/tool call, while Task Completion measures whether the model fulfilled the user’s overall request accurately.
- Instruction Adherence: Evaluate Function Calling focuses on whether the correct function and parameters were identified, while Instruction Adherence evaluates whether the output follows the prompt instructions more broadly.