> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scorecard.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Runs & Results

> Execute evaluations and analyze AI agent performance.

export const DarkLightImage = ({lightSrc, caption, alt, darkSrc = null, width = "1000"}) => {
  const getAbsoluteUrl = src => {
    if (src.startsWith('http://') || src.startsWith('https://')) {
      return src;
    }
    const currentUrl = typeof window !== 'undefined' ? window.location.origin : '';
    if (currentUrl.includes('.mintlify.app')) {
      const subdomain = currentUrl.split('.')[0].replace('https://', '');
      return `https://mintlify.s3.us-west-1.amazonaws.com/${subdomain}${src.startsWith('/') ? '' : '/'}${src}`;
    } else if (currentUrl === 'https://docs.scorecard.io') {
      return `https://mintlify.s3.us-west-1.amazonaws.com/scorecard-d65b5e8a${src.startsWith('/') ? '' : '/'}${src}`;
    } else {
      return `${currentUrl}${src.startsWith('/') ? '' : '/'}${src}`;
    }
  };
  const content = <>
      <img className="block dark:hidden" width={width} src={getAbsoluteUrl(lightSrc)} alt={alt} />
      <img className="hidden dark:block" width={width} src={getAbsoluteUrl(darkSrc || lightSrc.replace('light', 'dark'))} alt={alt} />
    </>;
  if (caption) {
    return <Frame caption={caption}>{content}</Frame>;
  } else {
    return content;
  }
};

<Warning>
  The Runs page is deprecated. Please use the [Records](/features/records) page instead.
</Warning>

A **Run** is an execution that evaluates your AI agent against some Testcases using specified metrics.

Runs generate **Records** (individual test executions) and **Scores** (evaluation results) for each Record that help you understand your agent's performance across different scenarios.

<DarkLightImage lightSrc="/images/run-details-light.png" caption="Run details page." alt="Screenshot of viewing run details in the UI." />

## What is a Run?

Every run consists of:

* (Optional) **Testset**: Collection of test cases to evaluate against
* (Optional) **Metrics**: Evaluation criteria that score system outputs
* (Optional) **System Version**: Configuration defining your AI system's behavior
* **Records**: Individual test executions, one per Testcase
* **Scores**: Evaluation results for each record against each metric

<DarkLightImage lightSrc="/images/runs-list-light.png" caption="List of recent runs with statuses." alt="Screenshot of viewing runs list in the UI." />

### Filtering Runs

You can filter runs by their **source** to quickly find runs created from specific workflows:

* **API**: Runs created programmatically via the SDK or API
* **Monitor**: Runs created automatically by production monitors
* **Playground**: Runs kicked off from the Playground
* **Kickoff**: Runs created via the Kickoff Run modal in the UI

## Creating Runs

### Kickoff Run from the UI

You can kickoff a run from the Projects dashboard, Testsets list, or a Run ("Run again" button). The **Kickoff Run** modal lets you choose the Testset, Prompt, and Metrics for the run.

The **Scorecard** tab lets you run using an LLM on Scorecard's servers, so you can specify the LLM parameters. The **GitHub** tab lets you trigger a run using GitHub Actions on your actual system.

<DarkLightImage lightSrc="/images/kickoff-run-light.png" caption="Kickoff Run modal." alt="Screenshot of the kickoff run modal in the UI." />

### From the Playground

The [Playground](/features/playground) allows you to test prompts interactively.

Click **Kickoff Run** to create a run with a specified testset, prompt version, and metrics.

<DarkLightImage lightSrc="/images/playground-light.png" caption="Playground overview." alt="Screenshot of the playground in the UI." />

### Using the API

You can create runs and records programmatically using the Scorecard SDK.

<CodeGroup>
  ```python Python theme={null}
  from scorecard_ai import Scorecard
  from scorecard_ai.lib import run_and_evaluate

  run = run_and_evaluate(
      client=Scorecard(),
      project_id="123",
      testset_id="456",
      metric_ids=["789", "101"],
      system=lambda system_input: ask_llm(system_input),
  )
  print(f'Go to {run["url"]} to view your results.')
  ```

  ```JavaScript JavaScript theme={null}
  import Scorecard, { runAndEvaluate } from 'scorecard-ai';

  const scorecard = new Scorecard();

  const run = await runAndEvaluate(scorecard, {
    projectId: "123",
    metricIds: ["789", "101"],
    testsetId: "456",
    system: (systemInput) => askLLM(systemInput),
  });
  console.log(`Go to ${run.url} to view your results.`);
  ```
</CodeGroup>

See the [SDK quickstart](/intro/quickstart) or [API reference](/api-reference/overview) for more details.

### Via GitHub Actions

Using Scorecard's [GitHub integration](/features/github-actions), you can trigger runs automatically (e.g. on pull request or a schedule).

With the integration set up, you can also trigger runs of your real system from the Scorecard UI.

<DarkLightImage lightSrc="/images/trigger-run-light.png" caption="Triggering a run through the Scorecard GitHub integration." alt="Screenshot of the trigger run page in the Scorecard UI." />

## Run Status

If you click on the run status of a run, it will expands to show an explanation of the scoring status for each of the metrics in the run.

<DarkLightImage lightSrc="/images/run-status-explanation-light.png" caption="Run status explanation." alt="Screenshot of the run status explanation in the UI." />

## Re-run Scoring

You can re-run scoring on existing runs without re-executing your system. This is useful when:

* You've updated a metric's guidelines and want to see how it affects existing results
* You want to add new metrics to a completed run
* You need to re-evaluate after fixing a metric configuration issue

To re-run scoring, open a run's details page and click the **Re-run Scoring** button. You can select which metrics to include in the re-scoring.

<DarkLightImage lightSrc="/images/changelog-rerun-scoring-light.png" darkSrc="/images/changelog-rerun-scoring-dark.png" caption="Re-run scoring modal for an existing run." alt="Screenshot of the re-run scoring modal." />

<Tip>
  Re-running scoring is much faster than creating a new run since it skips the system execution step and only re-evaluates the existing outputs.
</Tip>

## Inspect a Record

<DarkLightImage lightSrc="/images/testrecord-score-explanation-light.png" caption="Record score explanation." alt="Screenshot of viewing testrecord score explanation in the UI." />

Drill down into specific test executions for detailed analysis:

**Record Overview:**

* **Inputs**: Original Testcase data sent to your system
* **Outputs**: Generated responses from your AI system
* **Expected Outputs**: Ideal responses for comparison
* **Scores**: Detailed evaluation results from each metric

## Records Page

The **[Records](/features/records)** page provides a project-level view of all test records across your runs. Use it to search, filter, and analyze records without navigating through individual runs.

## Export a Run

Click the **Export as CSV** button on the run details page.

## Run Notes

Click **Show Details** on a run page to view or edit the run notes and view the system/prompt version the run was executed with.

<DarkLightImage lightSrc="/images/run-notes-light.png" caption="Run details with notes and system configuration." alt="Screenshot of viewing run details with notes and system configuration in the UI." />

<Note>
  Run data includes potentially sensitive information from your Testsets and system outputs. Follow your organization's data handling policies and avoid including PII or confidential data in test configurations.
</Note>

## Run History

View your AI system's performance trends over time with interactive charts for each metric on the run list page.

Run history helps you identify performance trends, compare metrics side-by-side, and track consistency across runs.

<DarkLightImage lightSrc="/images/run-history-light.png" darkSrc="/images/run-history-dark.png" caption="Run history charts showing three metrics' performance over time." alt="Screenshot of run history charts displaying multiple metrics over time." />

Use the dropdown to select which metrics to display. Hover over any data point to see detailed run information and scores, with synchronized highlighting across all visible charts.

Binary metrics display as percentages (passing rate) while integer metrics show the average score for that metric.

<Tip>
  Click any metric title to navigate to its detailed configuration page.
</Tip>
