Why combine SDK + Tracing?
Oftentimes, traces become cluttered with data, and SDK records don’t contain enough information to understand why something went wrong. To supplement the shortcomings of one or the other, we allow you to link the traces created by runs to the SDK, showing all of your data in one place:- Full observability into every LLM call, tool use, and retry
- Cost and latency breakdown per span
- Conversation view for chat-based traces
- Debugging context when a test case fails — see exactly which step went wrong
Setup
You need two things: the Scorecard SDK to create records, and tracing instrumentation so your system emits OpenTelemetry traces.1. Install the SDK
2. Enable tracing in your system
Choose any instrumentation method. The simplest approach depends on your stack: Agent frameworks (Claude Agent SDK, OpenAI Agents SDK) — set environment variables:Python
3. Use runAndEvaluate() with your traced system
- Claude Agent SDK
- SDK Wrappers (OpenAI, Anthropic)
For agent frameworks, pass The key is setting
otelLinkId from the options argument into your agent’s OTEL_RESOURCE_ATTRIBUTES so traces link to records.scorecard.otel_link_id in OTEL_RESOURCE_ATTRIBUTES — this tells the trace pipeline which record to merge into.How it works
runAndEvaluate() generates a unique otelLinkId for each testcase and passes it to your system function via the options argument. Your system can then insert this ID into the trace’s resource attributes (scorecard.otel_link_id), which tells the tracing pipeline to merge the trace into the corresponding record.
Multi-trace sessions
Some systems emit multiple traces per invocation — for example, the Claude Agent SDK generates a separate trace for each turn in a multi-turn conversation. Scorecard groups these into a single record using the session ID. When multiple traces share the samesession.id, they are combined into a unified span tree under a session root, with trace IDs accumulated rather than overwritten.
Your SDK inputs and outputs are preserved across all session merges.
Troubleshooting
| Symptom | Fix |
|---|---|
| Inputs/outputs not showing in Trace Overview | Verify your system function returns a value. Check that scorecard.project_id in the trace matches the project in runAndEvaluate(). |
| Trace not linking to the record | Ensure traces are sent to https://tracing.scorecard.io/otel. Confirm the API key belongs to the same org. Call force_flush() before the process exits. |
| Scores reference trace data instead of SDK data | {{inputs.*}} and {{outputs.*}} resolve to SDK data automatically. If you see raw trace JSON, the record wasn’t created via runAndEvaluate(). |
Next steps
- SDK Quickstart — Set up
runAndEvaluate()from scratch - Tracing — All instrumentation methods
- Metrics — Define evaluation criteria
- Records — Search and filter records