Steps
Setup accounts
Create a Scorecard account, then set your Scorecard API key as an environment variable.
Create simple LLM system to evaluate
For the quickstart, the LLM system is
run_system(), a simple function that takes an input and returns an output.In Scorecard, system inputs and outputs are dictionaries. The function receives system_input and returns a dictionary.- Without OpenAI key
- With OpenAI key
Here’s a simple system that does not require an OpenAI API key:
Specify Project
Create a new Project in Scorecard, or use the existing default Project. This is where your testsets, metrics, and runs are stored.Set the Project ID for later:
Create test cases
Create some test cases to represent the inputs and the ideal (
expected) outputs of your system.Create Metrics
Create two LLM-as-a-judge Metrics to evaluate whether your system uses the correct tone and addresses the recipient.The Metric’s prompt template uses Jinja syntax. For each Testcase, we will send the prompt template to the judge and replace
{{inputs.tone}} with the test case’s tone value.Evaluate system
Call
run_system() against the test cases and record the scored results in Scorecard.
