1. Installing Dependencies
2. Importing Modules
3. Configuring Environment Variables
Click here to access FutureAGI API Key and Secret Key.
4. Defining Database Schema
To test your Text-to-SQL agent, you will need a realistic data model. The schema below is complex enough to exercise real-world SQL features like joins, aggregations, filters. Once the tables are defined, you’ll fill them with a handful of rows. This curated dataset ensures that your agent’s SQL queries will encounter both common and edge-case scenarios like products with no orders, users spanning multiple categories, and so on.5. Defining Text2SQL Prompt Template
6. Building SQL Agent
To show best practices for text-to-sql agent, we will define a robust setup with multiple retries, smart early stopping, and both parsing and execution error handling. This agent represents a production-ready configuration.7. Experiment Runner
This experiment orchestration ensures that every test case runs in a consistent environment and that results are easy to inspect. This experiment runner executes the following steps:- Spins up the in-memory database with schema and sample data.
- Instantiates the chosen SQL agent.
- Iterates over every ground-truth question:
- Invokes the agent.
- Captures the generated SQL, execution results, any errors, and elapsed time.
- Collects all outcomes into organized tables for analysis.
- Success Rate: Percentage of queries that returned correct SQL.
- Average Latency: Mean time per query.
- Failure Counts: How many queries failed or timed out.
- Latency Extremes: Minimum and maximum execution times
8. Extracting and Evaluating SQL Queries
When you invoke the agent, you’ll ask for its intermediate reasoning steps. This lets you pull out the exact SQL that was executed, not just the final printed output. By comparing that SQL against your ground-truth queries, you can automatically mark successes and failures.9. Setting up Database
This creates each table from the schema, bulk-inserts your sample rows, and wraps the database engine in a LangChain helper so the agent can query it as if it were any other tool.10. Registering Tracing with Future AGI
- It is the process of adding tracing to your LLM applications. Tracing helps you monitor critical metrics like cost, latency, and evaluation results.
-
Where a span represents a single operation within an execution flow, recording input-output data, execution time, and errors, a trace connects multiple spans to represent the full execution flow of a request.
Click here to learn more about traces and spans
- Tracing using Future AGI requires following steps:
Step 1: Setting up Eval Tags
- To quantify performance, a set of evals according to the use-case are chosen. In this cookbook, since we are dealing with Text-to-SQL agent, so following built-in evals are chosen for evaluation:
-
COMPLETENESS
: Evaluates whether the agent’s response fully addresses the user’s query, ensuring all aspects of the SQL request are properly implemented. -
GROUNDEDNESS
: Assesses how well the agent’s responses are grounded in the actual database schema and tables, ensuring SQL queries reference valid tables, columns, and relationships. -
TEXT_TO_SQL
: Specifically evaluates the quality of natural language to SQL translation, measuring how accurately the agent converts user questions into syntactically correct and semantically appropriate SQL queries. -
DETECT_HALLUCINATION
: Identifies instances where the agent generates SQL that references non-existent tables, columns, or relationships that aren’t present in the database schema. -
table_checker
: A custom evaluation that verifies whether the SQL queries reference the appropriate tables needed to satisfy the user’s request, ensuring optimal join patterns and table selection.
Click here to learn more about the evals provided by Future AGI
-
- The
eval_tags
list contains multiple instances ofEvalTag
. EachEvalTag
represents a specific evaluation configuration to be applied during runtime, encapsulating all necessary parameters for the evaluation process. - Parameters of
EvalTag
:-
type
: Specifies the category of the evaluation tag. In this cookbook,EvalTagType.OBSERVATION_SPAN
is used. -
value
: Defines the kind of operation the evaluation tag is concerned with.EvalSpanKind.LLM
indicates that the evaluation targets operations involving Large Language Models.EvalSpanKind.TOOL
: For operations involving tools.
-
eval_name
: The name of the evaluation to be performed.Click here to get complete list of evals provided by Future AGI
-
config
: Dictionary for providing specific configurations for the evaluation. An empty dictionary means that default configuration parameters will be used. Click here to learn more about what config is required for corresponding evals -
mapping
: This dictionary maps the required inputs for the evaluation to specific attributes of the operation. Click here to learn more about what inputs are required for corresponding evals -
custom_eval_name
: A user-defined name for the specific evaluation instance. -
model
: LLM model name required to perform the evaluation. Such asTURING_LARGE
, which is a proprietary model provided by Future AGI.Click here to learn more about all the proprietary models provided by Future AGI
-
Step 2: Setting Up Trace Provider
-
The trace provider is part of the traceAI ecosystem, which is an OSS package that enables tracing of AI applications and frameworks. It works in conjunction with OpenTelemetry to monitor code executions across different models, frameworks, and vendors.
Click here to learn more about the list of supported frameworks
-
To configure a
trace_provider
, we need to pass following parameters toregister
function:project_type
: Specifies the type of project. In this cookbook,ProjectType.EXPERIMENT
is used since we are experimenting to test agent before deploying in production.ProjectType.OBSERVE
is used to observe your AI application in production and measure the performance in real-time.project_name
: The name of the project. This is dynamically set from a configuration dictionary,config['future_agi']['project_name']
- ***
project_version_name**:
**The version name of the project. Similar to project_name, this is also dynamically set from the configuration dictionary,config['future_agi']['project_version']
eval_tags
: A list of evaluation tags that define specific evaluations to be applied.
Step 3: Setting Up LangChain Instrumentor
- This is done to integrate with the LangChain framework for the collection of telemetry data.
Click here to know about all the supported frameworks by Future AGI
- The
instrument
method is called on theLangChainInstrumentor
instance. This method is responsible for setting up the instrumentation of the LangChain framework using the providedtracer_provider
. - Putting it all together, below is the code that configures
eval_tags
, and sets uptrace_provider
, which is then passed ontoLangChainInstrumentor
.
11. Main Function
This lets you run the experiment with your chosen agent, save detailed results and raw input/output pairs to CSV for audit and offline analysis, and print a concise summary of metrics so you can see at a glance how the agent performed.Result
- The agent starts by asking database for a list of all available tables. The database then responds with seven table names (order_items, orders, product_categories, product_category_mappings, products, reviews, users).
- It then requests the detailed schema (column definitions and constraints) plus a few example rows for exactly the tables needed to answer the question.
- Using that schema and sample data, the agent formulates a SQL statement that matches the natural-language requirement.
- Before execution, the agent submits the generated SQL to a validation tool that checks syntax and logical consistency. Once the query passes, the agent runs it against the database and retrieves the results.
- A ‘finished’ message confirms the entire cycle of introspecting tables, fetching schema, generating SQL, validating, and executing is completed without errors.
- For the next question, the same four-step workflow repeats (listing tables, fetching schema & samples, generating SQL, validating and executing).
- This process is repeated for all test queries, resulting a perfect success rate and low round-trip times (average of 43.6 ms, ranging from 29.5 ms to 77.1 ms).


- The agent demonstrates consistent table identification capabilities with scores ranging from 75-80% across the query set, indicating robust schema comprehension.
- Minimal hallucination metrics confirm the agent’s precision in referencing only existing database structures.
- Text-to-SQL Translation accuracy exhibits variance (25-75%) correlating with query complexity, indicating scope for enhancement.
- The agent maintains grounding in database schema (53-81%).
- Completeness metrics indicate potential areas for improvement.
Conclusion
This cookbook demonstrated how to build, evaluate, and trace a complete Text-to-SQL agent using Future AGI. From defining a realistic e-commerce schema to generating SQL queries and setting up robust agents, each component was designed such a way to reflect real-world complexity. With the help of Future AGI’s built-in evals and tracing, you can now have a framework not just for building agents but for auditing, debugging, and iterating toward production ready Text-to-SQL Agent.Ready To Evaluate Your Text-to-SQL Agent?
Start evaluating your AI agents with confidence using Future AGI’s tracing. Future AGI provides the tools you need to systematically improve your text-to-SQL agent.Click here to schedule a demo with us now!