UI Quickstart

Kick off your first run in minutes using the Example Project that’s created automatically when you sign up. You can also browse the run that was kicked off for you and explore how testsets, prompts, and metrics fit together.

Open the Example Project and click Kickoff Run

After org creation, navigate to the Example Project’s Records page. Click the Kickoff Run button in the top right to open the Kickoff Run modal.

Screenshot showing the Kickoff Run button in the top right of the Records page.

Kick off your first run

In the modal, you can use the default selected Testset, Agent Integration (Prompt, GitHub Action, or Endpoint), and Metrics.Click “Kickoff run” to create the run and automatically evaluate the system.

Kickoff modal with prefilled Testset, Prompt, and Metrics selections.

View results

After your run starts and scoring completes, open the results to see per-record scores, distributions, and explanations.

Run details page with scores and aggregates.

Click the Run Again button in the top right corner to iterate with a different prompt version, model, or metric set.

Click any record in the Records page to view individual testcase inputs, outputs, and score explanations.

Browse the Example Project

Learn how the sample data is organized:

Tone Testset: inputs original, tone → expected idealRewritten.
Prompt versions for Tone — already set to Scorecard Cloud with low temperature for consistency.
Metrics: Correctness (AI, 1-5), Human Tone Check (Human, Boolean).

Open a Testset to see its schema and Testcases. Click a testcase row to view its inputs and expected outputs.

Next, browse Prompts. Use “View” to open a prompt, review messages, and model settings.

Inside a prompt version, see the template (Jinja-style variables) and evaluator model configuration.

Finally, explore Metrics to learn how scoring works. Each metric has guidelines, evaluation type, and output type.

Where to go next

Read about creating and managing Testsets in Testsets
Dive deeper into running evaluations in Runs & Results
Explore interactive prompt iteration in the Playground
Define and reuse evaluation criteria with Metrics

That’s it — you’ve seen Scorecard in action and how example data flows through prompts, runs, and metrics. Have fun iterating!

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

Where to go next

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

​Where to go next

Where to go next