> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scorecard.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Testsets

> Use Testsets to create curated datasets for evaluating your AI agents

export const DarkLightImage = ({lightSrc, caption, alt, darkSrc = null, width = "1000"}) => {
  const getAbsoluteUrl = src => {
    if (src.startsWith('http://') || src.startsWith('https://')) {
      return src;
    }
    const currentUrl = typeof window !== 'undefined' ? window.location.origin : '';
    if (currentUrl.includes('.mintlify.app')) {
      const subdomain = currentUrl.split('.')[0].replace('https://', '');
      return `https://mintlify.s3.us-west-1.amazonaws.com/${subdomain}${src.startsWith('/') ? '' : '/'}${src}`;
    } else if (currentUrl === 'https://docs.scorecard.io') {
      return `https://mintlify.s3.us-west-1.amazonaws.com/scorecard-d65b5e8a${src.startsWith('/') ? '' : '/'}${src}`;
    } else {
      return `${currentUrl}${src.startsWith('/') ? '' : '/'}${src}`;
    }
  };
  const content = <>
      <img className="block dark:hidden" width={width} src={getAbsoluteUrl(lightSrc)} alt={alt} />
      <img className="hidden dark:block" width={width} src={getAbsoluteUrl(darkSrc || lightSrc.replace('light', 'dark'))} alt={alt} />
    </>;
  if (caption) {
    return <Frame caption={caption}>{content}</Frame>;
  } else {
    return content;
  }
};

A **Testset** is a collection of **Testcases** used to evaluate the performance of an AI agent across various inputs and scenarios. Think of it as a curated dataset specifically designed for testing AI agents. Testsets belong to a central theme like "Core Functionality", "Edge Cases", or "Customer Support Queries".

A **Testcase** is an individual test data point containing inputs, expected outputs, and metadata used for evaluation.

<DarkLightImage lightSrc="/images/testset-details-light.png" caption="Testset details page showing eight Testcases for a &#x22;Message tone rewriter&#x22; AI agent." alt="Screenshot of the testset details page in the UI." />

## Create a Testset

Go to the Testsets page in your project and click the **"New Testset"** button to create a new, empty Testset.

<Tip>
  Checking "Add example Testcases" will automatically generate 3 sample Testcases using AI based on your Testset's description. This provides a starting point for your Testset.
</Tip>

<DarkLightImage lightSrc="/images/testset-create-light.png" caption="Create testset modal with AI generation option" alt="Screenshot of the create testset modal in the UI." />

### Testset schema

Each Testset has a *schema*, which defines which fields a Testcase has, the type of each field, and the role each field plays in evaluation.

* <span className="text-blue-500 font-bold">Input fields</span> are sent to your AI agent.
* <span className="text-green-500 font-bold">Expected fields</span> are expected or ideal outputs, which metrics compare your agent's output to.
* <span className="dark:text-gray-400 text-gray-600 font-bold">Metadata fields</span> are additional context for analysis, not used by evaluation or your agent.

You can update the schema of a Testset by clicking the **"Edit Schema"** button in the Testset actions menu.

This allows you to add or remove fields, modify field types, and update field descriptions. Existing Testcases are not modified, but are validated against the new schema.

<DarkLightImage lightSrc="/images/testset-schema-edit-light.png" caption="Testset schema editor." alt="Screenshot of the testset schema editor in the UI." />

### With the SDK

You can also [create](/api-reference/create-testset) and [update](/api-reference/update-testset) Testsets with the Scorecard SDK.

You define a Testset's schema using the [JSON Schema](https://json-schema.org/understanding-json-schema/about) format.

For example, here's a schema for a customer support system:

```json expandable theme={null}
{
  "type": "object",
  "title": "Customer Support Schema",
  "properties": {
    "userQuery": {
      "type": "string",
      "description": "The customer's question or request"
    },
    "context": {
      "type": "string", 
      "description": "Additional context about the customer"
    },
    "ideal": {
      "type": "string",
      "description": "The ideal response from support"
    },
    "expectedSentiment": {
      "type": "string",
      "description": "The expected predicted sentiment of the user query."
    },
    "difficulty": {
      "type": "number",
      "description": "How difficult the customer support request is to solve (1-10)"
    }
  },
  "required": ["userQuery", "ideal"]
}
```

**Supported Data Types**

* **`string`**: Text content
* **`number`**: Numeric values (integers or floats)
* **`boolean`**: either `true` or `false`
* **`object`**: Nested JSON objects
* **`array`**: Lists of JSON values

You also need to define the *field mapping* when creating a Testset with the SDK.

A field mapping categorizes schema fields by their role in evaluation.

For example, here's a field mapping for the customer support schema above:

```json theme={null}
{
  "inputs": ["userQuery", "context"],
  "expected": ["ideal", "expectedSentiment"],
  "metadata": ["difficulty"]
}
```

### Understanding Input Fields

**Input fields** contain the actual data that gets sent to your AI system, workflow, or agent during testing. These should match exactly what your system expects to receive in production.

<Tip>
  **Quick tip**: Your input fields should match what goes INTO your AI system. Think about:

  * What do users type into your UI?
  * What data does your API receive?
  * What would a user or another system send to trigger your workflow?

  If you're unsure, ask an engineer on your team: "What JSON/data do we send to our AI system?"
</Tip>

#### Common patterns for structuring inputs

<Tabs>
  <Tab title="Chatbot/LLM">
    For conversational AI systems:

    ```json theme={null}
    {
      "user_message": "How do I reset my password?",
      "conversation_history": [...],
      "system_prompt": "You are a helpful assistant"
    }
    ```
  </Tab>

  <Tab title="AI Agent/Workflow">
    For multi-step AI workflows or agents:

    ```json theme={null}
    {
      "task_description": "Analyze this document and extract key points",
      "input_data": "Document content or reference",
      "workflow_parameters": {
        "mode": "detailed",
        "output_format": "bullet_points"
      }
    }
    ```
  </Tab>

  <Tab title="API Endpoint">
    For AI-powered API endpoints:

    ```json theme={null}
    {
      "query": "Find similar products",
      "filters": {"category": "electronics"},
      "max_results": 10
    }
    ```
  </Tab>

  <Tab title="Document Processing">
    For document analysis systems:

    ```json theme={null}
    {
      "document_content": "Full text or path to document",
      "extraction_query": "What are the payment terms?",
      "document_type": "contract"
    }
    ```
  </Tab>
</Tabs>

<Tip>
  Your input fields should mirror your production system's interface. If your system expects a single prompt string, use a single string field. If it expects structured JSON with multiple parameters, reflect that structure in your schema.
</Tip>

#### Simple Testcase Schema Examples

The following examples show complete testcase schemas with both input and expected fields. The field mapping below each schema identifies which fields are inputs vs expected outputs.

<Tabs>
  <Tab title="Basic Chatbot">
    ```json theme={null}
    {
      "type": "object",
      "properties": {
        "user_message": {
          "type": "string",
          "description": "What the user types in the chat box"
        },
        "expected_response": {
          "type": "string",
          "description": "What the bot should say back"
        }
      }
    }
    ```

    **Field mapping:**

    ```json theme={null}
    {
      "inputs": ["user_message"],
      "expected": ["expected_response"]
    }
    ```
  </Tab>

  <Tab title="Document Q&A System">
    ```json theme={null}
    {
      "type": "object",
      "properties": {
        "question": {
          "type": "string",
          "description": "The question about the document"
        },
        "document": {
          "type": "string",
          "description": "The document text to search through"
        },
        "expected_answer": {
          "type": "string",
          "description": "The correct answer from the document"
        }
      }
    }
    ```

    **Field mapping:**

    ```json theme={null}
    {
      "inputs": ["question", "document"],
      "expected": ["expected_answer"]
    }
    ```
  </Tab>

  <Tab title="AI Agent/Workflow">
    ```json theme={null}
    {
      "type": "object",
      "properties": {
        "task_description": {
          "type": "string",
          "description": "What you want the AI agent to do"
        },
        "input_data": {
          "type": "string",
          "description": "Any data the agent needs to complete the task"
        },
        "expected_output": {
          "type": "string",
          "description": "What the agent should produce"
        }
      }
    }
    ```

    **Field mapping:**

    ```json theme={null}
    {
      "inputs": ["task_description", "input_data"],
      "expected": ["expected_output"]
    }
    ```
  </Tab>
</Tabs>

<Note>
  The key is matching your test inputs to your actual system. If your system takes a single text field, use a single text field. If it takes multiple parameters, include those as separate fields.
</Note>

## Create and edit Testcases

### From the UI

Click the **"New Testcase"** button to create a new Testcase matching your Testset's schema.

<DarkLightImage lightSrc="/images/testcase-create-light.png" caption="Testcase creation modal." alt="Screenshot of the testcase creation modal in the UI." />

You can edit a particular Testcase's fields from the Testcase table, or from the Testcase details page.

<DarkLightImage lightSrc="/images/testcase-details-light.png" caption="Testcase details page." alt="Screenshot of the testcase details page in the UI." />

### Importing from a file

The Scorecard UI supports importing Testcases in CSV, TSV, JSON, and JSONL formats. Scorecard automatically maps your file's columns to the testset's schema fields and validates data.

<Tip>
  **Upsert behavior**: If your file includes testcases with IDs that already exist in the testset, those testcases will be updated with the new values rather than creating duplicates. This makes it easy to bulk-update existing testcases by re-uploading a modified file.
</Tip>

<CodeGroup>
  ```csv CSV theme={null}
  userQuery,context,ideal,category
  "How do I cancel my order?","Order placed 1 hour ago","You can cancel orders within 2 hours...","cancellation"
  "Where is my package?","Order shipped yesterday","Track your package using the link...","tracking"
  ```

  ```tsv TSV theme={null}
  userQuery	context	ideal	category
  How do I cancel my order?	Order placed 1 hour ago	You can cancel orders within 2 hours...	cancellation
  Where is my package?	Order shipped yesterday	Track your package using the link...	tracking
  ```

  ```json JSON theme={null}
  [
      {
          "userQuery": "Is my order eligible for expedited shipping?",
          "context": "Order placed 1 hour ago",
          "ideal": "Expedited shipping is available for orders over $50...",
          "category": "shipping"
      },
      {
          "userQuery": "How do I cancel my order?",
          "context": "Order placed 1 hour ago. Order shipped yesterday.",
          "ideal": "You can cancel orders within 2 hours...",
          "category": "cancellation"
      }
  ]
  ```

  ```jsonl JSONL theme={null}
  {"userQuery": "Is my order eligible for expedited shipping?", "context": "Order placed 1 hour ago", "ideal": "Expedited shipping is available for orders over $50...", "category": "shipping"}
  {"userQuery": "How do I cancel my order?", "context": "Order placed 1 hour ago. Order shipped yesterday.", "ideal": "You can cancel orders within 2 hours...", "category": "cancellation"}
  ```
</CodeGroup>

### Using the API

With our SDKs, you can [create](/api-reference/create-multiple-testcases), [update](/api-reference/update-testcase), and [delete](/api-reference/delete-multiple-testcases) Testcases.

## Export Testset

You can export a Testset's Testcases to a CSV file by clicking the **Export as CSV** button in the Testset's details page.

## Other Testset features

**Testset tags**

You can add custom tags to your Testsets to categorize them. For example, `regression` or `edge-cases`.

**Duplicate Testset**

You can create a copy of a Testset by clicking the **"Duplicate"** button in the Testset actions menu.

This maintains original field mappings, so it's useful for creating variants of your Testsets without having to recreate the schema.

## Best practices

**Testset strategy by use case**

<AccordionGroup>
  <Accordion title="Hillclimbing Testsets" icon="trending-up">
    **Purpose**: Iterative improvement and development

    **Size**: 5-20 Testcases

    **Content**: Your favorite prompts and edge cases that matter most

    **Usage**: Quick feedback during development cycles
  </Accordion>

  <Accordion title="Regression Testsets" icon="shield-check">
    **Purpose**: Ensure new changes don't break existing functionality

    **Size**: 50-100 Testcases

    **Content**: Representative examples of core use cases

    **Usage**: Run regularly (nightly builds, CI/CD pipelines)
  </Accordion>

  <Accordion title="Launch Evaluation Testsets" icon="rocket">
    **Purpose**: Comprehensive testing before major releases

    **Size**: 100+ Testcases

    **Content**: Broad coverage of all use cases and edge cases

    **Usage**: Pre-launch validation and confidence building
  </Accordion>

  <Accordion title="Must-Pass Testsets" icon="triangle-alert">
    **Purpose**: Critical functionality that must never fail

    **Size**: Variable (focus on precision over coverage)

    **Content**: High-precision Testcases for essential features

    **Usage**: Early checks in deployment pipelines
  </Accordion>
</AccordionGroup>

<Warning>
  Remember that testcase data may contain sensitive information. Follow your organization's data handling policies and avoid including PII, secrets, or confidential data in Testsets.
</Warning>
