Skip to main content
Week of 2025-10-02

What’s New

Bugs/Improvements ​​
  • Evaluation Group Management: Users can now configure and create evaluation groups directly from datasets and simulate, streamlining evaluation setup and saving time.
  • Default evals group: Access preconfigured evaluation groups for use cases like RAG, computer vision, etc., and save time in evaluation setup.
  • Advanced Simulation Management: Test executions now auto-refresh with real-time data, giving users instant visibility into ongoing runs. Users can stop simulations at any point to prevent unnecessary calls and costs. Enhanced features include Visual Workflow Tracing to pinpoint agent deviations, Real-Time Test Control to efficiently manage test execution, and Comprehensive Performance Metrics (latency, interruption response time, etc.) for precise agent evaluation and optimization.
Week of 2025-09-27

What’s New

Features
  • Agent Definition Versioning Upgrades: Managing agent definitions is now faster, simpler, and more organized. Instead of manually copy-pasting and creating new definitions each time, you can instantly create new versions with meaningful commit messages. All test reports are consolidated in one place, making it easy to access and compare logs across versions. With one-click versioning and unified test history, iteration cycles are now much faster—allowing you to update and test new agent configurations in seconds, not minutes.
  • Automated Scenario & Workflow Builder: Creating scenarios with synthetic data or uploaded datasets was useful, but it often lacked clarity in visualizing agent interactions. With the new Future AGI Scenario & Workflow Builder, you can simply upload SOPs or conversation transcripts and let the AI automatically generate comprehensive test scenarios—including edge cases that humans might miss. Each run now provides a clear, visual map of the exact conversation paths traversed by your agent, while the interactive workflow builder makes it easy to design, edit, and optimize flows. This enhanced experience delivers deeper insights, targeted edge case discovery, and a more intuitive way to implement and evaluate agent behavior.
  • Simplified User Session Tracking: Session management is now effortless. Instead of shutting down the trace provider and re-registering everything, you can simply add a session.id attribute to your spans. This makes it easy to group data into multiple sessions, enabling granular, user-level insights into your application’s performance and behavior.
Bugs/Improvements
  • Direct Trace-to-Prompt Linking: Introduced seamless linking of traces to prompts by leveraging the code snippet on the Prompt Workbench Metrics screen.
  • Enhanced Transcript Clarity: Updated transcript terminology so users can easily distinguish between messages from the Agent and responses from the FAGI Simulator, improving readability and context during review.
  • Workspace Switching Loader Fix: Fixed the loader behavior during workspace switching, ensuring a smoother transition.
  • Large Dataset Upload Stability: Improved dataset upload experience by resolving loading issues for large CSV/JSON files, enhancing stability and user visibility.
  • Custom Evaluation Editing Fixes: Resolved bugs in the Evals Playground to ensure smoother and more reliable editing of custom evaluations.
  • Group Evaluation UI/UX Improvements: Refined the user interface and experience when editing group evaluations, making the process more intuitive and consistent.
Week of 2025-09-22

What’s New

Features
  • Advanced Evaluation Group Management: Streamline your evaluation workflows with comprehensive CRUD operations for evaluation groups. Create, view, edit, and delete evaluation groups seamlessly, then apply them directly to tasks and prompts for consistent scoring across your AI applications. Enhanced with intelligent popovers that display eval input details, LLM/Knowledge Base dependencies, and linked evaluations during the grouping process.
  • Enhanced Call Management & Audio Controls: Manage your voice AI testing with the completely revamped Call Details Drawer that displays associated scenarios for each test run. Features a sophisticated multi-channel audio player for separate visualization and playback of assistant and customer audio streams.
  • Flexible Call Recording Downloads: Export call recordings in multiple formats (Caller Audio, Agent Audio, Mono Audio, Stereo Audio) to match your analysis workflow requirements. Coupled with granular audio field selection in evaluations for precise control over which conversation segments to score and analyze.
Bugs/Improvements
  • Enhanced Collaboration Features: Boost team productivity with collaborator support in prompts, allowing you to add and view team members working on specific prompts. Track prompt ownership with visible Created By fields and organize your work more efficiently with sorting capabilities for sample folders, prompts, and prompt templates.
  • Annotation & Prompt Import Fixes in Dataset: Enhanced annotation workflows by preventing empty label view selections and resolving prompt overflow issues in Run Experiment interfaces.
  • Filter Issues for Evals Selection: Bug fix for eval type filters on evaluations drawer across the platform.
Week of 2025-09-08

What’s New

Features
  • Intelligent Prompt Organization System: Transform your prompt management with our new folder-based architecture. Organize prompts and templates in a hierarchical structure, create reusable templates from existing prompts, and maintain consistency across your AI workflows. Templates function as fully-featured prompts while eliminating repetitive configuration tasks.
  • Enhanced Voice Agent Testing & Analytics: View comprehensive performance metrics of your voice agent test runs in an intuitive dashboard, including Top Performing Scenarios and conversation quality insights. The expanded simulate feature now includes additional scenario columns with grouping capabilities, customizable column visibility, and advanced filtering options—enabling you to optimize your voice AI implementations and focus on the most relevant data for your testing workflows.
  • Enhanced Plans & Pricing Experience: Navigate pricing options effortlessly with our completely redesigned pricing page featuring interactive plan comparison cards, a dynamic price calculator, and detailed plan breakdowns. The new design provides clear visibility into feature tiers and helps you make informed decisions about your subscription.
Bugs/Improvements
  • Enhanced Observability & Dashboard Accuracy: Resolved filtering issues for User ID across User Details Dashboard and Observe sections. Improved project selector clarity in Observe Eval Task Drawer and fixed workspace-level OTEL trace creation issues for more reliable monitoring.
  • UI/UX Enhancements: Streamlined simulation flow interfaces for better user experience and standardized decimal precision across the platform (displaying 2 decimal places for all numeric values).
  • Enhanced Data Visibility in Dataset Summary: Understand exactly how many data points contributed to your summary results and evaluation metrics, helping with complete transparency.
  • Code Snippet for Running Evals via SDK: Copy-paste ready terminal commands to run any evaluation without manual configuration by leveraging code snippet on the evals playground.
  • Unified Design System: Experience consistent interactions across the platform with our custom DatePicker component, ensuring a polished and cohesive user experience throughout your workflow.
Week of 2025-09-05

What’s New

Features
  • Comprehensive Annotation Quality Dashboard: Monitor annotation quality at scale with our centralized analytics dashboard. Track key metrics including annotator agreement rates, completion times, and advanced quality scores (cosine similarity, Pearson correlation, Fleiss’ kappa) to ensure your training data meets the highest standards.
  • Enterprise-Grade Multi-Workspace Security: Deploy with confidence using our complete RBAC framework. Create isolated workspaces, manage team members with full CRUD capabilities (edit, deactivate, resend invitations), and implement role-based access controls that scale with your organization’s security requirements.
  • Advanced Observability with Feed Insights: Gain unprecedented visibility into agent performance with the new Feed Insights tab in the Observe section. Identify failed stages, affected spans, view error cluster events, track user counts, and analyze trend data over time for rapid issue diagnosis and agent optimization.
  • Intelligent Onboarding Navigation: Experience streamlined onboarding with our redesigned sidebar that prominently highlights the ‘Get Started’ section until all 7 onboarding steps are completed. This ensures new users follow a structured path to success before transitioning to the regular navigation experience.
  • No Config Evals – Agent Compass for AI Teams: AI agent developers often struggle to identify performance bottlenecks and system failures across complex execution flows. Traditional evaluation methods and system metrics offer only fragmented, span-level visibility—leaving teams blind to the bigger picture. As a result, diagnosing latency spikes, inefficient prompts, or tool-call failures becomes a time-consuming, manual process. Without actionable, trace-level insights, performance optimization turns reactive, error-prone, and expensive.
Bugs/Improvements
  • Improved Observability Reliability: Enhanced backend resilience for incomplete span creation scenarios and fixed issues when OpenTelemetry exports fail partially, ensuring complete trace visibility.
Week of 2025-08-29

What’s New

Features
  • Add Rows in Evals Tab of Prompt Workbench: Instantly add new rows with variable values in the evaluations screen, allowing you to generate outputs and evaluate without returning to the Prompt Workbench homepage.
  • Trace Linked to Prompt Workbench: View comprehensive performance metrics (latency, cost, tokens, evaluation metrics) for each prompt version linked to traces (and spans) across development, staging, and production environments via the Metrics section in Prompt Workbench.
  • Critical Issue Detection & Mitigation Advice on Datasets: Get actionable, AI-powered insights with recommendations to improve your agent’s performance and accelerate your path to production.
  • Access FAGI from AWS Marketplace: Sign up or sign in to the FAGI platform via AWS Marketplace and leverage AWS contracts and billing to work with FAGI.
  • Support for LlamaIndex OTEL Instrumentation in TypeScript: Easily add observability to agents leveraging the LlamaIndex framework with our TypeScript SDK on the FAGI platform.
Bugs/Improvements
  • Improved UX for Evaluate Pages: Enhanced the Evaluate Page interface for a consistent experience across devices.
  • Faster Alert Graph Loading: Reduced load times of alert graphs in the Alerts feature for quicker and smoother performance.
  • UI Improvements for Sidebar Navigation: Enhanced sidebar navigation for better usability.
  • User Filtering on Navigation: When navigating from the Users List or User Details Page to the LLM Tracing or Sessions Page, the user’s ID is now automatically applied as a filter.
  • User Details Filter Persistence: User filters (for traces and sessions) now persist across page refreshes.
  • UI Enhancements for Simulator Agent Form: Improved the user interface for the simulator agent form.
  • Support for Video in Trace Detail Screen: Added support for viewing videos in the Trace Details screen.
  • Fixed Scroll Issue in Agent Description Box (Simulation): Enabled scroll functionality via mouse in the agent description box within the simulation module.
  • Error Handling on Simulation Page: Improved error handling for low credit balances on the simulation homepage to enhance user experience.
  • Credit Utilization for Error Localizer: Added visibility of credit utilization for the error localizer in the usage summary screen.
Week of 2025-08-19

What’s New

Features
  • Comparison Summary: Compare evaluations and prompt summaries of two different datasets now with detailed graphs and scores.
  • Function Evals: Enable adding and editing function-type custom evals from the list of evals supported by Future AGI.
  • Edit Synthetic Dataset: Edit existing synthetic datasets directly or create a new version from changes.
  • Document Column Support in Dataset: New document column type to upload/store files in cells (TXT, DOC, DOCX, PDF).
  • User Tab in Dashboard and Observe: Searchable, filterable user list and detailed user view with metrics, interactive charts, synced time filters, and traces/sessions tabs.
  • Displaying the Timestamp Column in Trace/Spans: Added Start Time and End Time columns in Observe → LLM Tracing and Prototype → All Runs → Run Details.
  • Configure Labels: Configure system and custom labels per prompt version in Prompt Management.
  • Async Evals via SDK: Run evaluation asynchronously for long-running evaluations or larger datasets.
Bugs/Improvements
  • SDK Codes: Update the SDK codes for columns and rows on create dataset, add rows, and landing dataset page.
  • Fixed the editable issue in custom evals form: Incorrect config was displayed on evals page for function evals.
  • The bottom section for trace detail drawer disappeared: Dragging the bottom section caused the entire bottom area to disappear; behavior corrected.
  • UI screen optimization for different screen sizes.
  • Bug fixes for updates summary screen - color, text, and font alignment.
  • Cell loading state issues while creating synthetic data.
  • UI enhancement for simulation agent flow.
  • CSV upload bug in datasets and UI fixes for add feedback pop-up.
Week of 2025-08-11

What’s New

Features
  • Summary Screen Revamp (Evaluation and Prompt): Unified visual overview of model performance with pass rates and comparative spider/bar/pie charts; includes compare views, drill-downs, and consistent filters.
  • Alerts Revamp: Create alert rules in Observe (+New Alert) from Alerts tab or project; notifications via Slack/Email with guided Alert Type and Configuration steps.
  • Upgrades in Prompt SDK: Increased prompt availability after first run by virtue of prompt caching. Seamlessly deploy prompts in production, staging, or dev and perform A/B tests using prompt SDK.
Bugs/Improvements
  • Run prompt issues for longer prompts (>5K words).
  • Bug fixes for voice simulation naming convention in transcript deleting runs and selection of agent simulator.
Week of 2025-08-07

What’s New

Features
  • Voice Simulation: New testing infrastructure that deploys AI agents to conduct real conversations with your voice systems, analyzing actual audio, not just transcripts.
  • Edit Evals Config: Now edit the config (prompt/criteria) for your custom evals via evals playground, but with the restriction of no variable addition.
Bugs/Improvements
  • Bug fix for dynamic column creation via Weviate.
  • Reduced dependencies for TraceAI packages (HTTPS & GRPC).
  • Automated eval refinement: Retune your evals in evals playground by providing feedback.
  • Markdown now available as a default option for improved readability.
  • Support for video (traces and spans) in Observe project.
Week of 2025-07-29

What’s New

Features
  • Edit, Duplicate, and Delete Custom Evals: Now duplicate, edit, or delete evaluations if they are not in use anymore or logic is outdated.
  • Bulk Annotation/User Feedback: Bulk annotate your observe traces with user feedback directly using API or SDK.
  • JSON View for Evals Log: Access evals log data in JSON format in evals playground.
Bugs/Improvements
  • Span name visibility in traces for Observe and Prototype.
  • Bug fix for adding owner to workspace.
  • Error handling for evaluations in prompt workbench.
  • Add variables to system and assistant user roles in prompt workbench.
  • Speed enhancement for dataset loading.
  • Error state handling for evaluations in prompt workbench.
Week of 2025-07-21

What’s New

Features
  • Run button on single cell in evaluations workbench.
  • Now users can add notes to observe traces.
Bugs/Improvements
  • Improved search logic to render relevant search results in dataset.
  • Dataset bugs and API network call optimizations.
  • Fixed audio icon.
  • Error handling for network connection issues.
  • Bug fixes for prompt workbench versioning issues.
  • Changed the color mapping for deterministic type evals.
  • Updated loaders for evals playground.
  • Pagination fix in Observe.
  • Added clear functionality in add to dataset column mapping fields in Observe.
  • Clear graph property when Observe changes; fixed thumbs down icon not rendering.
  • Generate variable bug fix in prompt workbench.
  • Experiment page break on content tab switch.
  • Fixed the created_at 30-day filter on evals log section.
Week of 2025-07-14

What’s New

Bugs/Improvements
  • Prevented overscroll in X direction for entire platform.
  • Glitch after refreshing while generating sample data.
  • Error message update for doc uploads and save button status for doc upload.
  • Variable auto-population issue in compare prompt for multiple versions.
  • Restricted function tab to LLM spans only.
  • Error handling for mandatory system prompt for a few LLM models.
  • Added API null check in all places.
  • Streaming issues after run prompt when the current prompt version is updated.
  • Truncate model name in model details drawer.
  • No rows error on dataset homepage for selective users with low speed.
  • Easier removal of filters for Observe and Prototype.
  • Fixed validation in quick filter number-related fields.
  • Fixed inconsistent fonts in evaluation workbench.
  • Added loading state to evaluations tab.
  • Knowledge base name not visible in a few cases issue fixed.
  • Fixed spacing issue in run prompt.
  • Link updated for the workbench help section and width update as list.
Week of 2025-05-05

What’s New

Features
  • Diff view in experiment.
  • Updated sections for Prototype and Observe.
  • Error localization in Observe.
  • [Observe+Prototype] Adding annotations flow for trace view details.
  • Updated dataset layout and table design.
  • Higher rate limits to send more traces in Observe.
  • Sorting in alert.
  • Support for audio in Observe and datasets.
Bugs/Improvements
  • Improved error handling in prompt versioning.
  • Removed unnecessary keys from evaluation outputs.
  • Better handling of required keys to column names in add_evaluation in dataset.
  • Removed TraceAI code from FutureAGI SDK - experiment rerun fix.
  • SSO login issues.
  • Eval ranking fixes.
  • Fixed sizing and view issue in dataset when row size is adjusted.
  • Fixed sidebar item not showing active style when child page is active globally.
  • Edit integer type has red background in edit field.
  • Fixed crashing of page when adding JSON value in dataset.
  • Fixed knowledge base status update issue in case of network issues.
  • Experiment tab bugs for some browsers and loading state issues on experiment page.
  • Bug in run insight section of Prototype.
Week of 2025-04-28

What’s New

Features
  • Prototype / All Runs columns dropdown change.
  • Prototype / Configure project.
  • Trace details view for Observe/Prototype.
  • Allow search in dataset.
  • Run insights view - evals (deployed without the error modal part).
  • Improved user flow for synthetic data creation with “best practices” for each input.
  • Add to dataset flow from Prototype.
  • API for Gmail account signup.
  • Enabling search within data.
  • First-time user experience walkthrough for newly onboarded users.
  • Quick filters for annotations view in Prototype and Observe.
  • Compare runs in Prototype.
  • Diff view for compare dataset.
  • Enhancement of Observe and Prototype.
  • Addition of new evals for audio - conversational and completeness evals.
Bugs/Improvements
  • New choice for Tone Eval if none of the choices are suitable.
  • Bug on experiment view.
  • UI/UX bugs - knowledge base and audio support for evals.
  • Required input field column detail not coming on Audio Quality evals.
  • UX changes for loader of plan screen.
  • Changed the color and the percentage of the eval chips in experiment.
Week of 2025-04-21

What’s New

Features
  • Quick filters in Prototype & Observe.
  • Added support for knowledge base creation and updating.
  • Optimization of synthetic data generation.
  • Evaluate working in compare datasets.
Bugs/Improvements
  • Rate limit hit better UI.
  • Audio and knowledge base bug fixes.
  • Improved wrong evals view.
  • Fixes in compare dataset.
  • Changed the logo URL.
  • Filter issue fixed in Prototype.
  • Rate limit error message to upgrade the plan.
  • Experiment optimization under datasets to work faster.
  • Huggingface error handling for different datasets.
I