Evaluate LLM Function Calling

A function call in LLM refers to the model’s ability to generate structured instructions or requests that can trigger external functions, APIs, or predefined actions in a system. This feature enables the LLM to interact with external services or perform specific tasks by calling a function in a format that developers or systems can process. Click here to read the eval definition of LLM Function Calling Below is the workflow of function calling: A function call generated by an LLM typically includes:

Function Name: Specifies the action to be performed (e.g., get_weather, book_flight).
Parameters: Contains the structured input needed for the function (e.g., city name, date, or user details).

Example: For the input query: “What’s the weather in San Francisco tomorrow?” The LLM might generate this function call:

{
  "function": "get_weather",
  "parameters": {
    "location": "San Francisco",
    "date": "2025-01-24"
  }
}

Future AGI provides framework to assess the accuracy and effectiveness of function calls made by the language model. It ensures that the model correctly interprets the input, generates valid function calls, and produces expected outputs. This evaluation is crucial for scenarios where LLMs interact with external systems or APIs through function calls. It ensures:

The generated function call adheres to the required format or syntax.
The function call accurately captures the intent of the input.
The output produced by the function aligns with the expected result.

a. Using Interface

Required Parameters

Input: The input column provided to the LLM that triggers the function call.
Output: Column which has the resulting function call or response generated by the LLM.

b. Using SDK

Export your API key and Secret key into your environment variables.

result = evaluator.evaluate(
    eval_templates="evaluate_function_calling",
    inputs={
        "input":"Get the weather for London",
        "output":'{"function": "get_weather", "parameters": {"city": "London", "country": "UK"}}'
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Evaluate LLM Function Calling

a. Using Interface

Required Parameters

b. Using SDK

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​a. Using Interface

​Required Parameters

​b. Using SDK

a. Using Interface

Required Parameters

b. Using SDK