LLM Function Calling

Evaluation Using Interface

Input:

input: The input column provided to the LLM that triggers the function call.
output: Column which has the resulting function call or response generated by the LLM.

Output:

Result: Passed / Failed

Interpretation:

Passed: The LLM correctly identified that a function/tool call was necessary based on the input and accurately extracted the required parameters into the expected format.
Failed: The LLM did not correctly handle the function call requirement. This could mean it either failed to recognize the need for a function call altogether, or it recognized the need but extracted incorrect or incomplete parameters from the input.

Evaluation Using SDK

Click here to learn how to setup evaluation using SDK.

Input	Parameter	Type	Description
Required	`input`	`string`	input text provided to the LLM that triggers the function call.
	`output`	`string`	output text which has the resulting function call or response generated by the LLM.

Output	Type	Description
Result	`bool`	Returns `0` or `1`. `0`: The LLM did not correctly handle the function call requirement. `1`: The LLM correctly identified that a function/tool call was necessary.

result = evaluator.evaluate(
    eval_templates="llm_function_calling",
    inputs={
        "input": "Get the weather for London",
        "output": '{"function": "get_weather", "parameters": {"city": "London", "country": "UK"}}'
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

What to Do When Function Calling Evaluation Fails

Examine the output to identify whether the failure was due to missing function call identification or incorrect parameter extraction. If the output did not recognise the need for a function call, review the input to ensure that the function’s necessity was clearly communicated. If the parameters were incorrect or incomplete. Refining the model’s output or adjusting the function call handling process can help improve accuracy in future evaluations.

Differentiating Function Calling Eval with API Call Eval

The API Call evaluation focuses on making network requests to external services and validating the responses, while Evaluate LLM Function Calling examines whether LLMs correctly identify and execute function calls. API calls are used for external interactions like retrieving data or triggering actions, while function call evaluation ensures that LLMs correctly interpret and execute function calls based on input prompts. They differ in validation criteria, where API calls are assessed based on response content, status codes, and data integrity, the function call evaluation focuses on the accuracy of function call identification and parameter extraction.

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

LLM Function Calling

Evaluation Using Interface

Evaluation Using SDK

What to Do When Function Calling Evaluation Fails

Differentiating Function Calling Eval with API Call Eval

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Evaluation Using Interface

​Evaluation Using SDK

​What to Do When Function Calling Evaluation Fails

​Differentiating Function Calling Eval with API Call Eval

Evaluation Using Interface

Evaluation Using SDK

What to Do When Function Calling Evaluation Fails

Differentiating Function Calling Eval with API Call Eval