Skip to main content
The Scorecard SDK lets you evaluate AI systems with structured inputs, outputs, and scores. Tracing captures every LLM call, tool use, and retry with full observability. SDK + Tracing combines both into a single record.

Why combine SDK + Tracing?

Oftentimes, traces become cluttered with data, and SDK records don’t contain enough information to understand why something went wrong. To supplement the shortcomings of one or the other, we allow you to link the traces created by runs to the SDK, showing all of your data in one place:
  • Full observability into every LLM call, tool use, and retry
  • Cost and latency breakdown per span
  • Conversation view for chat-based traces
  • Debugging context when a test case fails — see exactly which step went wrong

Setup

You need two things: the Scorecard SDK to create records, and tracing instrumentation so your system emits OpenTelemetry traces.

1. Install the SDK

pip install scorecard-ai

2. Enable tracing in your system

Choose any instrumentation method. The simplest approach depends on your stack: Agent frameworks (Claude Agent SDK, OpenAI Agents SDK) — set environment variables:
export BETA_TRACING_ENDPOINT="https://tracing.scorecard.io/otel"
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer <your-scorecard-api-key>"
export ENABLE_BETA_TRACING_DETAILED=1
export OTEL_RESOURCE_ATTRIBUTES="scorecard.project_id=<your-project-id>"
SDK wrappers (OpenAI, Anthropic) — wrap your client:
Python
from scorecard_ai import wrap
from openai import OpenAI

openai = wrap(OpenAI(), {"project_id": "YOUR_PROJECT_ID"})

3. Use runAndEvaluate() with your traced system

For agent frameworks, pass otelLinkId from the options argument into your agent’s OTEL_RESOURCE_ATTRIBUTES so traces link to records.
import { query } from '@anthropic-ai/claude-agent-sdk';
import Scorecard, { runAndEvaluate, type SystemOptions } from 'scorecard-ai';

const scorecard = new Scorecard();
const PROJECT_ID = 'YOUR_PROJECT_ID';

async function system(input: { question: string }, options?: SystemOptions) {
  let resultText = '';
  for await (const message of query({
    prompt: input.question,
    options: {
      env: {
        ...process.env,
        ENABLE_BETA_TRACING_DETAILED: '1',
        BETA_TRACING_ENDPOINT: 'https://tracing.scorecard.io/otel',
        OTEL_EXPORTER_OTLP_HEADERS: `Authorization=Bearer ${process.env.SCORECARD_API_KEY}`,
        OTEL_RESOURCE_ATTRIBUTES: `scorecard.otel_link_id=${options!.otelLinkId},scorecard.project_id=${PROJECT_ID}`,
      },
    },
  })) {
    if (message.type === 'result' && message.subtype === 'success') resultText = message.result;
  }
  return { answer: resultText };
}

const run = await runAndEvaluate(scorecard, {
  projectId: PROJECT_ID,
  metricIds: ['YOUR_METRIC_ID'],
  system,
  testcases: [
    { inputs: { question: 'What is the capital of France?' }, expected: { answer: 'Paris' } },
  ],
});
The key is setting scorecard.otel_link_id in OTEL_RESOURCE_ATTRIBUTES — this tells the trace pipeline which record to merge into.

How it works

runAndEvaluate() generates a unique otelLinkId for each testcase and passes it to your system function via the options argument. Your system can then insert this ID into the trace’s resource attributes (scorecard.otel_link_id), which tells the tracing pipeline to merge the trace into the corresponding record.

Multi-trace sessions

Some systems emit multiple traces per invocation — for example, the Claude Agent SDK generates a separate trace for each turn in a multi-turn conversation. Scorecard groups these into a single record using the session ID. When multiple traces share the same session.id, they are combined into a unified span tree under a session root, with trace IDs accumulated rather than overwritten. Your SDK inputs and outputs are preserved across all session merges.

Troubleshooting

SymptomFix
Inputs/outputs not showing in Trace OverviewVerify your system function returns a value. Check that scorecard.project_id in the trace matches the project in runAndEvaluate().
Trace not linking to the recordEnsure traces are sent to https://tracing.scorecard.io/otel. Confirm the API key belongs to the same org. Call force_flush() before the process exits.
Scores reference trace data instead of SDK data{{inputs.*}} and {{outputs.*}} resolve to SDK data automatically. If you see raw trace JSON, the record wasn’t created via runAndEvaluate().

Next steps