Create testcase from span screenshot.Create testcase from span screenshot.

Create testcase from span.

Live traffic uncovers edge-cases synthetic datasets rarely catch. Trace ➜ Testcase lets you capture those moments, convert them into structured testcases, and bake them into your regression suite—all without leaving the trace view. This is production-to-dataset in one click.

Why it matters

  • Grow testsets from real user prompts and outputs.
  • Reproduce tricky scenarios when evaluating new models or prompts.
  • Track quality over time with consistent, production-grounded inputs.

If you call it something else

  • Traces → Testcases: Similar to “snapshotting” production examples into a labeled dataset. Scorecard streamlines it in the trace UI and auto-maps prompt/completion fields.
  • Where testcases live: Saved into Testsets so you can re-run evaluations and compare Runs & Results over time.
  • What’s extracted: Prompt and completion are pulled from common keys (openinference.*, ai.prompt/ai.response, gen_ai.*).
Create testcase modal screenshot.Create testcase modal screenshot.

Create testcase modal.

How it works

  1. Open any trace and drill down to the span that contains the LLM call.
  2. Click Create Testcase (document icon).
  3. Pick a Testset or create a new one.
  4. Scorecard auto-extracts the prompt and completion from span attributes (openinference.*, ai.prompt / ai.response, gen_ai.*). Adjust fields before saving.
  5. The testcase appears immediately in the selected testset.
Select Testset dialog screenshot.Select Testset dialog screenshot.

Select Testset dialog.

Create testcase result screenshot.Create testcase result screenshot.

Create testcase result .

Use cases

  • create gold-standard datasets from production data
  • run offline evaluations on real cases, and validate changes before deployment.
Automating this flow? Use the /testcases API to create testcases programmatically from traces.