Definition

Evaluates the accuracy and effectiveness of function calls made by LLM. It checks whether the output correctly identifies the need for a tool call and whether it accurately includes the tool with the appropriate parameters extracted from the input.


Calculation

The evaluation process begins with configuring the input and output to be assessed while specifying the evaluation criteria. During function call analysis, the system determines whether the output correctly identifies the need for a function call based on the input. It also verifies that the correct tool is selected and that parameters are accurately extracted.

The final result is generated as a Pass/Fail outcome. If the output correctly detects when a function call is required and includes the appropriate parameters, it passes; otherwise, it fails.


What to Do When Function Calling Evaluation Fails

Examine the output to identify whether the failure was due to missing function call identification or incorrect parameter extraction. If the output did not recognise the need for a function call, review the input to ensure that the function’s necessity was clearly communicated. If the parameters were incorrect or incomplete.

Refining the model’s output or adjusting the function call handling process can help improve accuracy in future evaluations.


Differentiating Function Calling Eval with API Call Eval

The API Call evaluation focuses on making network requests to external services and validating the responses, while Evaluate LLM Function Calling examines whether LLMs correctly identify and execute function calls.

API calls are used for external interactions like retrieving data or triggering actions, while function call evaluation ensures that LLMs correctly interpret and execute function calls based on input prompts.

They differ in validation criteria, where API calls are assessed based on response content, status codes, and data integrity, the function call evaluation focuses on the accuracy of function call identification and parameter extraction.