Eval Definition
Text to SQL
Evaluates the accuracy and quality of SQL queries generated from natural language instructions.
Evaluation Using Interface
Input:
- Required Inputs:
- input: The natural language query or instruction.
- output: The generated SQL query to evaluate.
Output:
- Result: Returns ‘Passed’ if the SQL query correctly represents the natural language request, ‘Failed’ if it doesn’t.
- Reason: A detailed explanation of why the SQL query was classified as correct or incorrect.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input:
- Required Inputs:
- input:
string
- The natural language query or instruction. - output:
string
- The generated SQL query to evaluate.
- input:
Output:
- Result: Returns a list containing ‘Passed’ if the SQL query correctly represents the natural language request, or ‘Failed’ if it doesn’t.
- Reason: Provides a detailed explanation of the evaluation.
Example Output:
What to do If you get Undesired Results
If the SQL query is evaluated as incorrect (Failed) and you want to improve it:
- Ensure the SQL syntax is correct and follows standard conventions
- Verify that all tables and columns referenced match the database schema implied by the natural language query
- Check that the query filters for exactly the data requested (no more, no less)
- Make sure appropriate joins are used when multiple tables are involved
- Confirm that the query handles potential edge cases like NULL values appropriately
- Use the correct data types for values in comparisons (e.g., quotation marks for strings)
- For complex queries, consider breaking them down into simpler parts for troubleshooting
Comparing Text to SQL with Similar Evals
- Task Completion: While Text to SQL focuses specifically on converting natural language to SQL queries, Task Completion evaluates whether a response completes the requested task more generally.
- Evaluate Function Calling: Text to SQL evaluates SQL generation specifically, whereas Evaluate Function Calling assesses the correctness of function calls and parameters more broadly.
- Is Code: Text to SQL evaluates the correctness of SQL generation, while Is Code detects whether content contains code of any type.