Skip to main content
result = evaluator.evaluate(
    eval_templates="llm_function_calling",
    inputs={
        "input": "Get the weather for London",
        "output": '{"function": "get_weather", "parameters": {"city": "London", "country": "UK"}}'
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
Input
Required InputTypeDescription
inputstringinput provided to the LLM that triggers the function call.
outputstringLLM’s output that has the resulting function call or response.
Output
FieldDescription
ResultReturns Passed if the LLM correctly identified that a function/tool call was necessary, or Failed if the LLM did not correctly handle the function call requirement.

What to Do When Function Calling Evaluation Fails

Examine the output to identify whether the failure was due to missing function call identification or incorrect parameter extraction. If the output did not recognise the need for a function call, review the input to ensure that the function’s necessity was clearly communicated. If the parameters were incorrect or incomplete. Refining the model’s output or adjusting the function call handling process can help improve accuracy in future evaluations.

Differentiating Function Calling Eval with API Call Eval

The API Call evaluation focuses on making network requests to external services and validating the responses, while Evaluate LLM Function Calling examines whether LLMs correctly identify and execute function calls. API calls are used for external interactions like retrieving data or triggering actions, while function call evaluation ensures that LLMs correctly interpret and execute function calls based on input prompts. They differ in validation criteria, where API calls are assessed based on response content, status codes, and data integrity, the function call evaluation focuses on the accuracy of function call identification and parameter extraction.
I