Skip to main content
The Analysis page provides a powerful interface for comparing performance metrics across multiple runs side-by-side. It enables you to visualize score distributions, examine individual test records in detail, and identify patterns in your AI system’s behavior.
Screenshot of the Analysis page displaying multiple runs and their score distributionsScreenshot of the Analysis page displaying multiple runs and their score distributions

Analysis page showing multi-run comparison with score distributions

What is Analysis?

Analysis in Scorecard allows you to:
  • Compare Multiple Runs: View up to multiple runs simultaneously to track performance evolution
  • Visualize Score Distributions: See how scores are distributed across different metrics and runs
  • Inspect Individual Records: Drill down into specific test cases to understand outputs and scoring details
  • Identify Patterns: Spot improvements, regressions, and consistency issues across iterations
You can access the Analysis page from the Runs list page by selecting the runs you want to compare and clicking the Analyze button.

Understanding Score Distributions

The score distribution matrix provides an at-a-glance view of performance across your selected runs and metrics.
Screenshot of the score distribution matrix with color-coded performance barsScreenshot of the score distribution matrix with color-coded performance bars

Score distribution matrix showing performance across runs and metrics

The matrix displays runs along the rows and metrics along the columns. The cell shows the score distribution of that run for that metric. Different metric types are displayed appropriately. Boolean metrics show their pass rate and integer and float metrics show their average score.

Detailed Records Table

Below the score distribution matrix, the Records Table provides granular inspection of individual test cases. The records table here is organized the same way as the run details page.
Screenshot of the records table showing test case details across multiple runsScreenshot of the records table showing test case details across multiple runs

Detailed records table with side-by-side comparison

Record Deduplication

The table intelligently groups records by testcase:
  • With Testcase IDs: Multiple runs evaluating the same testcase appear in a single row
  • Without Testcase IDs: Each record appears as a separate row
  • Visual Indicators: Color-coded bars show which runs evaluated each testcase

Best Practices

Begin your analysis with the most recent runs to understand current performance, then add historical runs for trend analysis.
If your runs have the same testcases, Scorecard automatically groups records by testcase for you.
In the records table, you can filter records to only show those that mention a keyword in the inputs, outputs, or expected outputs.
Pay special attention to testcases where scores vary significantly between runs. These often reveal important insights. Is it a challenging or incorrect testcase? Are your metric guidelines underspecified? Has your system regressed in a particular area?
Use run notes to document what you learn from analysis sessions for future reference.

Common Use Cases

Performance Regression Detection

Quickly identify when changes to your system have negatively impacted performance:
  1. Select your current production run as baseline.
  2. Add the latest development run for comparison.
  3. Look for metrics showing decreased scores (red or yellow where previously green).
  4. Drill into specific records to understand what changed.

Multi-Model Comparison

Compare different models or configurations:
  1. Run the same testset with different models.
  2. Select all model runs in the Analysis page.
  3. Compare score distributions to identify the best performer.
  4. Examine specific outputs to understand quality differences.

Metric Validation

Verify that your metrics are working correctly:
  1. Select runs with known good and bad outputs.
  2. Choose the metrics you want to validate.
  3. Verify that scores align with expected quality assessments.
  4. Adjust metric configurations if needed.

System Evolution Tracking

Monitor how your system improves over time:
  1. Select runs from different development phases.
  2. Focus on your primary success metrics.
  3. Observe the trend in score distributions.
  4. Document improvements in run notes.
Use the URL sharing feature to save specific analysis configurations. The URL automatically includes selected run IDs, making it easy to return to or share specific comparisons.