> ## Documentation Index
> Fetch the complete documentation index at: https://docs.scorecard.io/llms.txt
> Use this file to discover all available pages before exploring further.

# SDK Quickstart

> Evaluate a simple LLM agent with Scorecard in minutes.

export const DarkLightImage = ({lightSrc, caption, alt, darkSrc = null, width = "1000"}) => {
  const getAbsoluteUrl = src => {
    if (src.startsWith('http://') || src.startsWith('https://')) {
      return src;
    }
    const currentUrl = typeof window !== 'undefined' ? window.location.origin : '';
    if (currentUrl.includes('.mintlify.app')) {
      const subdomain = currentUrl.split('.')[0].replace('https://', '');
      return `https://mintlify.s3.us-west-1.amazonaws.com/${subdomain}${src.startsWith('/') ? '' : '/'}${src}`;
    } else if (currentUrl === 'https://docs.scorecard.io') {
      return `https://mintlify.s3.us-west-1.amazonaws.com/scorecard-d65b5e8a${src.startsWith('/') ? '' : '/'}${src}`;
    } else {
      return `${currentUrl}${src.startsWith('/') ? '' : '/'}${src}`;
    }
  };
  const content = <>
      <img className="block dark:hidden" width={width} src={getAbsoluteUrl(lightSrc)} alt={alt} />
      <img className="hidden dark:block" width={width} src={getAbsoluteUrl(darkSrc || lightSrc.replace('light', 'dark'))} alt={alt} />
    </>;
  if (caption) {
    return <Frame caption={caption}>{content}</Frame>;
  } else {
    return content;
  }
};

Scorecard has [Python](https://pypi.org/project/scorecard-ai/) (v3.2.1) and [TypeScript](https://www.npmjs.com/package/scorecard-ai) (v2.2.0) SDKs. If you're using Python, you can follow along in [Google Colab](https://colab.research.google.com/drive/1A53j48o75Decrjz7WzDVP62XeDEKCHHP).

## Steps

<Steps>
  <Step title="Setup accounts">
    Create a [Scorecard account](https://app.scorecard.io/dashboard), then set your [Scorecard API key](https://app.scorecard.io/settings) as an environment variable.

    ```sh theme={null}
    export SCORECARD_API_KEY="your_scorecard_api_key"
    ```
  </Step>

  <Step title="Install Scorecard SDK">
    Install the Scorecard library:

    <CodeGroup>
      ```sh Python (pip) theme={null}
      pip install scorecard-ai
      ```

      ```sh JavaScript (npm) theme={null}
      npm install scorecard-ai
      ```
    </CodeGroup>
  </Step>

  <Step title="Create simple LLM system to evaluate">
    For the quickstart, the LLM system is `run_system()`, a simple function that takes an input and returns an output.

    In Scorecard, system inputs and outputs are dictionaries. The function receives `system_input` and returns a dictionary.

    <Tabs>
      <Tab title="Without OpenAI key">
        Here's a simple system that does not require an OpenAI API key:

        <CodeGroup>
          ```py Python wrap theme={null}

          # Example:
          # run_system({"original": "How are you?", "recipient": "team", "tone": "formal"})
          # -> {"rewritten": "Hello team,\nHOW ARE YOU?"}
          def run_system(system_input: dict) -> dict:
              recipient = system_input.get("recipient", "")
              return {
                "rewritten": f"Hello {recipient} in tone {system_input['tone']},\n{system_input['original'].upper()}"
              }
          ```

          ```js JavaScript wrap theme={null}
          // Example:
          // runSystem({"original": "How are you?", "recipient": "team", "tone": "formal"})
          // -> {"rewritten": "Hello team,\nHOW ARE YOU?"}
          async function runSystem(systemInput) {
            const recipient = systemInput.recipient ?? "";
            return {
              rewritten: `Hello ${recipient} in tone ${systemInput.tone},\n${systemInput.original.toUpperCase()}`
            };
          }
          ```
        </CodeGroup>
      </Tab>

      <Tab title="With OpenAI key">
        If you have an OpenAI API key, you can create a realistic system.

        Install the OpenAI library:

        <CodeGroup>
          ```sh Python (pip) theme={null}
          pip install openai
          ```

          ```sh JavaScript (npm) theme={null}
          npm install openai
          ```
        </CodeGroup>

        Then, create the system:

        <CodeGroup>
          ```py Python wrap theme={null}
          from openai import OpenAI

          # Find your API key at https://platform.openai.com/api-keys
          openai = OpenAI(api_key="your_openai_api_key")

          # Example:
          # run_system({"original": "Hello, world!", "tone": "formal concise", "recipient": "team"})
          # -> {"rewritten": "Greetings, team."}
          def run_system(system_input: dict) -> dict:
              recipient = system_input.get('recipient','')
              response = openai.responses.create(
                  model="gpt-4o-mini",
                  instructions=f"You are a tone translator. Convert the user's message to the tone: {system_input['tone']}." + (f" Address the recipient: {recipient}" if recipient else ""),
                  input=system_input['original'],
              )
              return {
                  "rewritten": response.output_text
              }
          ```

          ```js JavaScript wrap theme={null}
          import OpenAI from 'openai';

          // By default, the API key is taken from the environment variable.
          const openai = new OpenAI();

          // Example:
          // runSystem({"original": "Hello, world!", "tone": "formal concise", "recipient": "team"})
          // -> {"rewritten": "Greetings, team."}
          async function runSystem(systemInput) {
            const recipient = systemInput.recipient ?? "";
            const response = await openai.responses.create({
              model: 'gpt-4o-mini',
              instructions: `You are a tone translator. Convert the user's message to the tone: ${systemInput.tone}. ` +
                (systemInput.recipient ? `Address the recipient: ${systemInput.recipient}` : ""),
              input: systemInput.original,
            });
            return {
              rewritten: response.output_text
            };
          }
          ```
        </CodeGroup>
      </Tab>
    </Tabs>
  </Step>

  <Step title="Setup Scorecard">
    <CodeGroup>
      ```py Python theme={null}
      from scorecard_ai import Scorecard

      # By default, the API key is taken from the environment variable.
      scorecard = Scorecard()
      ```

      ```js JavaScript theme={null}
      import Scorecard from 'scorecard-ai';

      // By default, the API key is taken from the environment variable.
      const scorecard = new Scorecard();
      ```
    </CodeGroup>
  </Step>

  <Step title="Specify Project">
    Create a new Project in Scorecard, or use the existing default Project. This is where your testsets, metrics, and runs are stored.

    Set the Project ID for later:

    <CodeGroup>
      ```py Python theme={null}
      PROJECT_ID = "123"  # Replace with your project ID
      ```

      ```js JavaScript theme={null}
      const PROJECT_ID = "123"; // Replace with your Project ID
      ```
    </CodeGroup>
  </Step>

  <Step title="Create test cases">
    Create some test cases to represent the inputs and the ideal (`expected`) outputs of your system.

    <CodeGroup>
      ```py Python theme={null}
      testcases = [
          {
              # `inputs` gets passed to the system as a dictionary.
              "inputs": {
                  "original": "We need your feedback on the new designs ASAP.",
                  "tone": "polite",
              },
              # `expected` is the ideal output of the system used by the LLM-as-a-judge to evaluate the system.
              "expected": {
                  "idealRewritten": "Hi, your feedback is crucial to the success of the new designs. Please share your thoughts as soon as possible.",
              },
          },
          {
              "inputs": {
                  "original": "I'll be late to the office because my cat is sleeping on my keyboard.",
                  "tone": "funny",
              },
              "expected": {
                  "idealRewritten": "Hey team! My cat's napping on my keyboard and I'm just waiting for her to give me permission to leave. I'll be a bit late!",
              },
          },
      ]
      ```

      ```js JavaScript theme={null}
      const testcases = [
        {
          // `inputs` gets passed to the system as an object.
          inputs: {
            original: 'We need your feedback on the new designs ASAP.',
            tone: 'polite',
          },
          // `expected` is the ideal output of the system used by the LLM-as-a-judge to evaluate the system.
          expected: {
            idealRewritten: 'Hi! your feedback is crucial to the success of the new designs. Please share your thoughts as soon as possible.',
          },
        },
        {
          inputs: {
            original: "I'll be late to the office because my cat is sleeping on my keyboard.",
            tone: 'funny',
          },
          expected: {
            idealRewritten: "Hey team! My cat's napping on my keyboard and I'm just waiting for her to give me permission to leave. I'll be a bit late!",
          },
        },
      ];
      ```
    </CodeGroup>
  </Step>

  <Step title="Create Metrics">
    Create two LLM-as-a-judge Metrics to evaluate whether your system uses the correct tone and addresses the recipient.

    The Metric's prompt template uses Jinja syntax. For each Testcase, we will send the prompt template to the judge and replace `{{inputs.tone}}` with the test case's `tone` value.

    <CodeGroup>
      ```py Python theme={null}
      import textwrap

      tone_accuracy_metric = scorecard.metrics.create(
          project_id=PROJECT_ID,
          name="Tone accuracy",
          description="How well does it match the intended tone?",
          eval_type="ai",
          output_type="int",
          prompt_template=textwrap.dedent("""
            You are a tone evaluator. Grade the response on how well it matches the
            intended tone: "{{inputs.tone}}". Use a score of 1 if the tones are very
            different and 5 if they are the exact same.
            
            Response: {{outputs.rewritten}}
            
            Ideal response: {{expected.idealRewritten}}
            
            {{ gradingInstructionsAndExamples }}"""),
      )

      recipient_address_metric = scorecard.metrics.create(
          project_id=PROJECT_ID,
          name="Recipient address",
          description="Does it address the recipient only if specified?",
          eval_type="ai",
          output_type="boolean",
          prompt_template=textwrap.dedent("""
            {% if inputs.recipient %}
              Does the response refer to the correct recipient: {{inputs.recipient}}?
              Response: {{outputs.rewritten}}
            {% else %}
              The response should avoid referring to any specific recipient.
              Response: {{outputs.rewritten}}
            {% endif %}
            
            {{ gradingInstructionsAndExamples }}"""),
      )
      ```

      ```js JavaScript wrap theme={null}
      const toneAccuracyMetric = await scorecard.metrics.create({
        projectId: PROJECT_ID,
        name: "Tone accuracy",
        evalType: "ai",
        outputType: "int",
        promptTemplate: "You are a tone evaluator. Grade the response on how well it matches the intended tone {{inputs.tone}} and the tone of the ideal response. Use a score of 1 if the tones are very different and 5 if they are the exact same.\n\nResponse: {{outputs.rewritten}}\n\nIdeal response: {{expected.idealRewritten}}\n\n{{ gradingInstructionsAndExamples }}",
      });

      const recipientAddressMetric = await scorecard.metrics.create({
        projectId: PROJECT_ID,
        name: "Recipient address",
        evalType: "ai",
        outputType: "boolean",
        promptTemplate: "Does the response refer to the correct recipient: {{inputs.recipient}}? Response: {{outputs.rewritten}}\n\n{{ gradingInstructionsAndExamples }}",
      });
      ```
    </CodeGroup>
  </Step>

  <Step title="Evaluate system">
    Call `run_system()` against the test cases and record the scored results in Scorecard.

    <CodeGroup>
      ```py Python wrap theme={null}
      from scorecard_ai.lib import run_and_evaluate

      run = run_and_evaluate(
          client=scorecard,
          project_id=PROJECT_ID,
          testcases=testcases,
          metric_ids=[tone_accuracy_metric.id, recipient_address_metric.id],
          system=lambda input, _system_version: run_system(input),
      )

      print(f'Go to {run["url"]} to view your scored results.')
      ```

      ```js JavaScript wrap theme={null}
      import Scorecard, { runAndEvaluate } from 'scorecard-ai';

      const run = await runAndEvaluate(scorecard, {
        projectId: PROJECT_ID,
        testcases: testcases,
        metricIds: [toneAccuracyMetric.id, recipientAddressMetric.id],
        system: runSystem,
      });
      console.log(`Go to ${run.url} to view your scored results.`);
      ```
    </CodeGroup>
  </Step>

  <Step title="Analyze results">
    Finally, review the results in Scorecard to understand the performance of your system.

    <DarkLightImage lightSrc="/images/quickstart-run-light.png" caption="Viewing results in the Scorecard UI." alt="Screenshot of viewing results in the Scorecard UI." />
  </Step>
</Steps>

## Where to go next

* [Tracing Quickstart](/intro/tracing-quickstart) — Add observability to your LLM calls
* [SDK + Tracing](/features/sdk-tracing) — Combine evaluation data with trace observability in a single record
