Product Updates
New updates and improvements
New API and SDK
We’ve released the alpha of our new Scorecard SDKs, featuring streamlined API endpoints for creating, listing, and updating system configurations, as well as programmatic experiment execution. With this alpha, you can integrate scoring runs directly into your development and CI workflows, configure systems as code, and fully automate your evaluation pipeline without manual steps.
Our pre-release Python SDK (2.0.0-alpha.0) and Javascript SDK (1.0.0-alpha.1) are now available.
New Quickstarts and Documentation
To support the SDK alpha, we’ve launched comprehensive SDK reference docs and concise quickstart guides that show you how to:
-
Install and initialize the TypeScript or Python SDK.
-
Create and manage system configurations in code.
-
Run your first experiment programmatically.
-
Retrieve and interpret run results within your applications.
-
Follow these step-by-step walkthroughs to get your first experiment up and running in under five minutes.
Bug fixes and improvements
-
[Performance] Reduced page load times and improved responsiveness when handling large run results for the run history table.
-
[UI] Removed metric-specific scoring progress, scoring and execution start and end times, and improved project names wrap across all screen sizes.
-
[Testsets] Resolved bug with new CSV upload flow
-
[Testsets] Added back Move testset to project
-
[Testsets] Archived testsets are hidden correctly, keeping your workspace clutter‑free.
-
[Reliability] Improved API reliability and workflow robustness: fixed run creation schema errors, streamlined testcase creation/duplication/deletion flows, and added inline schema validation to prevent submission errors
-
[Evals] Migrated from gpt4-1106-preview (Nov 2023) to gpt-4o for scoring metrics
Run insights
We’ve added a new Run History chart on Runs & Results that visualises your performance trends over time to spot regressions or sustained improvements at a glance (up and to the right!). The x‑axis is the run date, the y‑axis is the mean score, and each metric gets its own colored line. You can view this by clicking on the ‘All Runs’ tab of Runs & Results.
🧪 Testsets 2.0 for Easier creation and iteration
Testsets got a full upgrade. We reworked the creation flow, added AI-powered example generation, and streamlined testcase iteration. Filtering, sorting, editing, and bulk actions are now faster and more intuitive—so you can ship better tests, faster.
-
You can now create testsets with a simplified modal and generate relevant example testcases based on title and description.
-
Bulk editing tools make it easier to manage and update multiple testcases at once.
-
You can edit large JSON blobs inline in the testcase detail view, with improved scroll and copy behavior.
-
The testset detail page now shows the associated schema in context for easier debugging and review.
-
Navigation has improved with linked testset titles and run/testcase summaries directly accessible from the cards.
🗂️ Improved schema management
Schemas are now defined and managed per testset, rather than at the project level—giving teams more flexibility and control.
- The schema editor has been redesigned, allowing teams to update schemas independently for each testset.
- Schema changes now reflect immediately in the testcase table to help users see their impact in real time.
- Users can view and copy raw schema JSON for integration with their own tools or SDKs.
- We’ve also improved messaging in the schema editor to clarify the distinction between inputs and labels.
Bug fixes and improvements
- [Platform] Internal tech stack upgrades to support faster product iteration
- [Testsets] Friendlier zero-state, faster load, cards with quick actions and live counts
- [Testsets] Tag propagation fix — updates now apply across all testsets and views
- [Testsets] Improved sorting behavior, including reliable default and column sorting
- [Testsets] Filter testcases by keyword, searching across the full dataset
- [Testsets] Updated page actions with visible bulk tools for managing multiple testcases
- [Testsets] Testset cards now link to runs/testcases, and support fast schema editing, duplication, or deletion
- [Testsets] Titles now link directly to the testset detail page
- [Testcases] Detail page supports editing and copying large JSON blobs
- [Testcases] Schema panel added for better context while reviewing or editing testcases
- [Projects] Enhanced cards with summaries for testsets, metrics, and runs, all linked for easier navigation
- [Projects] Improved sorting with more intuitive labels and default order
- [Projects] Faster load performance across the project overview page
- [Schemas] Improved editor messaging to clarify the difference between input fields and labels
- [Toast Messages] Now deep-link to newly created items (testsets, testcases, projects)
- [Performance] Faster page loads, filtering, sorting, and table actions, powered by new APIs and backend improvements
Projects
We simplified project creation by adding a create project modal to the projects page and project detail pages.
SDKs
We’re working on overhauling our API and SDKs. We switched to using Stainless for SDK generation and released version 1.0.0-alpha.0 of our Node SDK. Over the next few weeks, we will stabilize the new API and Node and Python SDKs.
Bug fixes and improvements
- [Playground] Filtering by testcase now works properly as well as searching. The pagination was also improved and now works as expected when before it showed inconsistent items in some cases.
- [Testsets] We fixed a bug where we exported empty testcase values as the string “null” instead of an empty string.
- [Projects] We simplified project creation by adding a create project modal to the projects page and project detail pages.
- [Testcases] We fixed a bug that broke the Generate testcases feature.
- [Settings] We added some text on the API keys page to clarify that your Scorecard API key is personal, but model API keys are scoped to the organization.
Tracing
We’ve significantly improved our trace management system by relocating traces within the project hierarchy for better organization. Users can now leverage robust search capabilities with full-text search across trace data, complete with highlighted match previews. The new date range filtering system offers multiple time range options from 30 minutes to all time, while project scope filtering allows viewing traces from either the current project or across all projects. We’ve enhanced data visualization with dynamic activity charts and improved trace tables for better insights. Our library support now focuses specifically on Traceloop, OpenLLMetry, and OpenTelemetry for optimal integration.
In addition, the trace system now includes intelligent AI span detection that automatically recognizes AI operations across different providers. Visual AI indicators with special badges clearly show model information at a glance. We’ve added test case generation capabilities that extract prompts and completions to easily create test cases. For better resource monitoring, token usage tracking provides detailed metrics for LLM consumption.
Examples repository
We’ve published comprehensive integration examples demonstrating OpenTelemetry configuration with Scorecard, including Python Flask implementation with LLM tracing for OpenAI and Node.js Express implementation with similar capabilities. A new setup wizard provides clear configuration instructions for popular telemetry libraries to help users get started quickly.
We also updated our quickstart documentation to be more comprehensive.
Bug fixes and improvements
- [Scoring] When a run metric has not yet been scored, we now display N/A instead of NaN, making it clearer that it has no data.
- [Prompt Management] We made stability and performance improvements to prompt management workflows.
- [Projects] All resources now belong to projects, including those created before Scorecard Projects were introduced.
- [Exports] Custom fields in CSV exports of run results are handled more reliably.
- [Organizations] When a user switches organizations, we now redirect them to the organization’s projects page.
- [Testsets] On the testcase page, we fixed the link back to the testset.
- [Metrics] We added a new autosize textarea component that lets you keep typing the metric description without running out of space.
- [Playground] The “Prompt manager”, “Update”, and “Delete prompt” buttons are now disabled for default prompts. When selecting metrics, the “Select and score now” button is now the primary button rather than the “Select” button.
- [API] When a user does not include their Scorecard API key, we now return a friendlier 401 error: “Missing API key” rather than “malformed token”.
- [Scoring] The human scoring panel collapses the run details page allowing users to see model responses and while scoring.
- [Platform] We enhanced platform stability and increased test coverage.
New Project Overview Page
We redesigned our project overview page, including some useful information in the new sidebar and made it possible to edit the name and description of a project in the same place.
Docs Site Revamp
We’re excited to announce we’ve moved to a completely revamped documentation site! Key improvements include:
- Improved navigation structure
- Better search functionality
- Enhanced API documentation
- New updates section to track changes
- Modern, cleaner design
This change will help us better serve our users with clearer, more organized documentation.