Comparison Summary: Compare evaluations and prompt summaries of two different datasets now with detailed graphs and scores.
Function Evals: Enable adding and editing function-type custom evals from the list of evals supported by Future AGI.
Edit Synthetic Dataset: Edit existing synthetic datasets directly or create a new version from changes.
Document Column Support in Dataset: New document column type to upload/store files in cells (TXT, DOC, DOCX, PDF).
User Tab in Dashboard and Observe: Searchable, filterable user list and detailed user view with metrics, interactive charts, synced time filters, and traces/sessions tabs.
Displaying the Timestamp Column in Trace/Spans: Added Start Time and End Time columns in Observe → LLM Tracing and Prototype → All Runs → Run Details.
Configure Labels: Configure system and custom labels per prompt version in Prompt Management.
Async Evals via SDK: Run evaluation asynchronously for long-running evaluations or larger datasets.
Bugs/Improvements
SDK Codes: Update the SDK codes for columns and rows on create dataset, add rows, and landing dataset page.
Fixed the editable issue in custom evals form: Incorrect config was displayed on evals page for function evals.
The bottom section for trace detail drawer disappeared: Dragging the bottom section caused the entire bottom area to disappear; behavior corrected.
UI screen optimization for different screen sizes.
Bug fixes for updates summary screen - color, text, and font alignment.
Cell loading state issues while creating synthetic data.
UI enhancement for simulation agent flow.
CSV upload bug in datasets and UI fixes for add feedback pop-up.
Summary Screen Revamp (Evaluation and Prompt): Unified visual overview of model performance with pass rates and comparative spider/bar/pie charts; includes compare views, drill-downs, and consistent filters.
Alerts Revamp: Create alert rules in Observe (+New Alert) from Alerts tab or project; notifications via Slack/Email with guided Alert Type and Configuration steps.
Upgrades in Prompt SDK: Increased prompt availability after first run by virtue of prompt caching. Seamlessly deploy prompts in production, staging, or dev and perform A/B tests using prompt SDK.
Bugs/Improvements
Run prompt issues for longer prompts (>5K words).
Bug fixes for voice simulation naming convention in transcript deleting runs and selection of agent simulator.
Voice Simulation: New testing infrastructure that deploys AI agents to conduct real conversations with your voice systems, analyzing actual audio, not just transcripts.
Edit Evals Config: Now edit the config (prompt/criteria) for your custom evals via evals playground, but with the restriction of no variable addition.
Bugs/Improvements
Bug fix for dynamic column creation via Weviate.
Reduced dependencies for TraceAI packages (HTTPS & GRPC).
Automated eval refinement: Retune your evals in evals playground by providing feedback.
Markdown now available as a default option for improved readability.
Support for video (traces and spans) in Observe project.