

Testset details page showing eight Testcases for a "Message tone rewriter" LLM system.
Create a Testset
Go to the Testsets page in your project and click the “New Testset” button to create a new, empty Testset.Checking “Add example Testcases” will automatically generate 3 sample Testcases using AI based on your Testset’s description. This provides a starting point for your Testset.


Create testset modal with AI generation option
Testset schema
Each Testset has a schema, which defines which fields a Testcase has, the type of each field, and the role each field plays in evaluation.- Input fields are sent to your AI system.
- Expected fields are expected or ideal outputs, which metrics compare your AI system’s output to.
- Metadata fields are additional context for analysis, not used by evaluation or your system.


Testset schema editor.
With the SDK
You can also create and update Testsets with the Scorecard SDK. You define a Testset’s schema using the JSON Schema format. For example, here’s a schema for a customer support system:string
: Text contentnumber
: Numeric values (integers or floats)boolean
: eithertrue
orfalse
object
: Nested JSON objectsarray
: Lists of JSON values
Create and edit Testcases
From the UI
Click the “New Testcase” button to create a new Testcase matching your Testset’s schema.

Testcase creation modal.


Testcase details page.
Importing from a file
The Scorecard UI supports importing Testcases in CSV, TSV, JSON, and JSONL formats. Scorecard automatically maps your file’s columns to the testset’s schema fields and validates data.Using the API
With our SDKs, you can create, update, and delete Testcases.Export Testset
You can export a Testset’s Testcases to a CSV file by clicking the Export as CSV button in the Testset’s details page.Other Testset features
Testset tags You can add custom tags to your Testsets to categorize them. For example,regression
or edge-cases
.
Duplicate Testset
You can create a copy of a Testset by clicking the “Duplicate” button in the Testset actions menu.
This maintains original field mappings, so it’s useful for creating variants of your Testsets without having to recreate the schema.
Best practices
Testset strategy by use caseHillclimbing Testsets
Hillclimbing Testsets
Purpose: Iterative improvement and developmentSize: 5-20 TestcasesContent: Your favorite prompts and edge cases that matter mostUsage: Quick feedback during development cycles
Regression Testsets
Regression Testsets
Purpose: Ensure new changes don’t break existing functionalitySize: 50-100 TestcasesContent: Representative examples of core use casesUsage: Run regularly (nightly builds, CI/CD pipelines)
Launch Evaluation Testsets
Launch Evaluation Testsets
Purpose: Comprehensive testing before major releasesSize: 100+ TestcasesContent: Broad coverage of all use cases and edge casesUsage: Pre-launch validation and confidence building
Must-Pass Testsets
Must-Pass Testsets
Purpose: Critical functionality that must never failSize: Variable (focus on precision over coverage)Content: High-precision Testcases for essential featuresUsage: Early checks in deployment pipelines
Remember that testcase data may contain sensitive information. Follow your organization’s data handling policies and avoid including PII, secrets, or confidential data in Testsets.