Evaluation Using Interface

Input:

  • Required Inputs:
    • input: The natural language query or instruction.
    • output: The generated SQL query to evaluate.

Output:

  • Result: Returns ‘Passed’ if the SQL query correctly represents the natural language request, ‘Failed’ if it doesn’t.
  • Reason: A detailed explanation of why the SQL query was classified as correct or incorrect.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

  • Required Inputs:
    • input: string - The natural language query or instruction.
    • output: string - The generated SQL query to evaluate.

Output:

  • Result: Returns a list containing ‘Passed’ if the SQL query correctly represents the natural language request, or ‘Failed’ if it doesn’t.
  • Reason: Provides a detailed explanation of the evaluation.
result = evaluator.evaluate(
    eval_templates="text_to_sql", 
    inputs={
        "input": "List the names of all employees who work in the sales department.",
        "output": "SELECT name FROM employees WHERE department = 'sales';"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].metrics[0].value)
print(result.eval_results[0].reason)

Example Output:

['Passed']
The evaluation is 'Passed' because the SQL query correctly and efficiently implements the natural language request.

*   The query uses the **appropriate SELECT statement** to retrieve only the requested 'name' field from the employees table.
*   The **WHERE clause** correctly filters for employees in the 'sales' department using case-sensitive string comparison.
*   The query is **syntactically correct** with proper semicolon termination and correct SQL syntax.
*   The solution is **optimally efficient**, retrieving only the necessary data without unnecessary joins or sub-queries.

A different evaluation was not possible because the SQL query fully satisfies all requirements of the natural language prompt.

What to do If you get Undesired Results

If the SQL query is evaluated as incorrect (Failed) and you want to improve it:

  • Ensure the SQL syntax is correct and follows standard conventions
  • Verify that all tables and columns referenced match the database schema implied by the natural language query
  • Check that the query filters for exactly the data requested (no more, no less)
  • Make sure appropriate joins are used when multiple tables are involved
  • Confirm that the query handles potential edge cases like NULL values appropriately
  • Use the correct data types for values in comparisons (e.g., quotation marks for strings)
  • For complex queries, consider breaking them down into simpler parts for troubleshooting

Comparing Text to SQL with Similar Evals

  • Task Completion: While Text to SQL focuses specifically on converting natural language to SQL queries, Task Completion evaluates whether a response completes the requested task more generally.
  • Evaluate Function Calling: Text to SQL evaluates SQL generation specifically, whereas Evaluate Function Calling assesses the correctness of function calls and parameters more broadly.
  • Is Code: Text to SQL evaluates the correctness of SQL generation, while Is Code detects whether content contains code of any type.