Tool-Calling Agent Simulation with Tracing

Run a tool-calling agent through simulated scenarios, trace every tool invocation as child spans, and inspect results in the Tracing dashboard.

📝
TL;DR

Run a tool-calling agent through simulated conversations and trace every tool invocation as child spans in the Tracing dashboard.

TimeDifficultyPackage
15 minIntermediateagent-simulate, fi-instrumentation-otel
Prerequisites

Install

pip install agent-simulate fi-instrumentation-otel traceai-openai openai
export FI_API_KEY="your-api-key"
export FI_SECRET_KEY="your-secret-key"
export OPENAI_API_KEY="your-openai-api-key"

Tutorial

Define tools and mock execution

Define two OpenAI function schemas and a mock execution layer. In production, swap the mocks for real API calls.

import json

TOOLS = [
    {
        "type": "function",
        "function": {
            "name": "check_order_status",
            "description": "Look up the current status of a customer order.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The unique order identifier, e.g. 'ORD-12345'.",
                    }
                },
                "required": ["order_id"],
            },
        },
    },
    {
        "type": "function",
        "function": {
            "name": "initiate_refund",
            "description": "Start a refund for a customer order.",
            "parameters": {
                "type": "object",
                "properties": {
                    "order_id": {
                        "type": "string",
                        "description": "The unique order identifier to refund.",
                    },
                    "reason": {
                        "type": "string",
                        "description": "The reason for the refund.",
                    },
                },
                "required": ["order_id", "reason"],
            },
        },
    },
]


def execute_tool(tool_name: str, arguments: dict) -> str:
    if tool_name == "check_order_status":
        return json.dumps({
            "order_id": arguments.get("order_id", "UNKNOWN"),
            "status": "shipped",
            "carrier": "FedEx",
            "tracking_number": "FX-9988776655",
            "estimated_delivery": "2026-03-06",
        })
    elif tool_name == "initiate_refund":
        return json.dumps({
            "order_id": arguments.get("order_id", "UNKNOWN"),
            "refund_id": "REF-554433",
            "status": "approved",
            "amount": "$149.99",
            "timeline": "3-5 business days",
        })
    else:
        return json.dumps({"error": f"Unknown tool: {tool_name}"})

Write the agent callback

The callback wraps each turn in a parent agent-turn span. Inside it, auto-instrumented OpenAI calls and manual tool-execution spans form a tree:

agent-turn
├── OpenAI chat (tool-call request)    ← auto-instrumented
├── execute: check_order_status        ← manual span
├── execute: initiate_refund           ← manual span (if parallel tools)
└── OpenAI chat (synthesis)            ← auto-instrumented
import asyncio
import os

import openai
from fi.simulate import AgentInput, TestRunner
from fi_instrumentation import register, FITracer
from fi_instrumentation.fi_types import ProjectType
from traceai_openai import OpenAIInstrumentor

trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="tool-calling-simulation",
    set_global_tracer_provider=True,
)
OpenAIInstrumentor().instrument(tracer_provider=trace_provider)

tracer = FITracer(trace_provider.get_tracer(__name__))
openai_client = openai.AsyncOpenAI()

SYSTEM_PROMPT = """You are a helpful customer support agent for ShopFast.
You assist customers with order status inquiries and refund requests.
Always use the available tools to look up order information before responding.
Be concise, accurate, and empathetic."""


async def agent_callback(input: AgentInput) -> str:
    with tracer.start_as_current_span("agent-turn") as span:
        span.set_attribute("thread_id", input.thread_id or "")

        # Build message history, skipping assistant messages with tool_calls
        # (the SDK strips tool-role responses from history, so these would be orphaned)
        messages = [{"role": "system", "content": SYSTEM_PROMPT}]
        for msg in input.messages:
            if msg.get("role") == "assistant" and msg.get("tool_calls"):
                continue
            messages.append(msg)

        response = await openai_client.chat.completions.create(
            model="gpt-4o-mini",
            messages=messages,
            tools=TOOLS,
            tool_choice="auto",
            temperature=0.2,
        )

        choice = response.choices[0]

        if choice.finish_reason == "tool_calls":
            messages.append(choice.message)

            for tool_call in choice.message.tool_calls:
                with tracer.start_as_current_span(
                    f"execute: {tool_call.function.name}"
                ) as tool_span:
                    args = json.loads(tool_call.function.arguments)
                    tool_span.set_attribute("tool.name", tool_call.function.name)
                    tool_span.set_attribute("tool.parameters", json.dumps(args))

                    tool_result = execute_tool(tool_call.function.name, args)
                    tool_span.set_attribute("tool.result", tool_result)

                messages.append({
                    "role": "tool",
                    "tool_call_id": tool_call.id,
                    "content": tool_result,
                })

            follow_up = await openai_client.chat.completions.create(
                model="gpt-4o-mini",
                messages=messages,
                temperature=0.2,
            )
            return follow_up.choices[0].message.content or ""

        return choice.message.content or ""

Run the simulation

async def main():
    runner = TestRunner(
        api_key=os.environ["FI_API_KEY"],
        secret_key=os.environ["FI_SECRET_KEY"],
    )

    await runner.run_test(
        run_test_name="tool-calling-test",
        agent_callback=agent_callback,
    )

    print("Simulation complete.")


asyncio.run(main())

Expected output:

🔍 Fetching Run Test ID for name: tool-calling-test
✓ Found Run Test ID: <uuid>
Starting Simulation for Run ID: <uuid>
✓ Test Execution Started: <uuid>
🔄 Fetching batch of scenarios...
📥 Received batch: 3 calls
▶️ Processing Call: <uuid>
✓ Call Finished: <uuid> (6 turns)
✅ Cloud Simulation Completed.
Simulation complete.

Warning

The run_test_name must exactly match the simulation name in the dashboard. A mismatch returns a 404.

Inspect tool call spans in the Tracing dashboard

Go to app.futureagi.comTracing → find traces from tool-calling-simulation. Click any trace to expand the span tree. Each turn that triggered a tool call shows this hierarchy:

  • agent-turn (parent): has thread_id attribute
    • OpenAI chat: the initial request with finish_reason: tool_calls
    • execute: check_order_status: tool name, parameters, and result as span attributes
    • OpenAI chat: the synthesis call that produces the final response

Turns where the model responds directly (no tool call) show a single OpenAI child span under agent-turn.

What you built

You can now run tool-calling agents through simulated scenarios and inspect every tool invocation as traced child spans in the dashboard.

  • Defined two OpenAI function schemas and a mock execution layer
  • Wrote an agent callback that handles the full tool-call loop, including parallel tool calls (detect → execute all → synthesize)
  • Traced every OpenAI call as child spans under a manual agent-turn parent, with dedicated tool-execution spans showing name, parameters, and result
  • Ran the simulation via TestRunner
Was this page helpful?

Questions & Discussion