Tracing: Trace to Testcase

Create testcase from span.

Live traffic uncovers edge-cases synthetic datasets rarely catch. Trace ➜ Testcase lets you capture those moments, convert them into structured testcases, and bake them into your regression suite—all without leaving the trace view. This is production-to-dataset in one click.

Why it matters

Grow testsets from real user prompts and outputs.
Reproduce tricky scenarios when evaluating new models or prompts.
Track quality over time with consistent, production-grounded inputs.

If you call it something else

Traces → Testcases: Similar to “snapshotting” production examples into a labeled dataset. Scorecard streamlines it in the trace UI and auto-maps prompt/completion fields.
Where testcases live: Saved into Testsets so you can re-run evaluations and compare Runs & Results over time.
What’s extracted: Prompt and completion are pulled from common keys (openinference.*, ai.prompt/ai.response, gen_ai.*).

Create testcase modal.

How it works

Open any trace and drill down to the span that contains the LLM call.
Click Create Testcase (document icon).
Pick a Testset or create a new one.
Scorecard auto-extracts the prompt and completion from span attributes (openinference.*, ai.prompt / ai.response, gen_ai.*). Adjust fields before saving.
The testcase appears immediately in the selected testset.

Select Testset dialog.

Create testcase result .

Use cases

create gold-standard datasets from production data
run offline evaluations on real cases, and validate changes before deployment.

Automating this flow? Use the /testcases API to create testcases programmatically from traces.

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

Tracing: Trace to Testcase

Why it matters

If you call it something else

How it works

Use cases

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

​Why it matters

​If you call it something else

​How it works

​Use cases

Why it matters

If you call it something else

How it works

Use cases