Execute evaluations and analyze AI system performance.
A Run is an execution that evaluates your AI system against some Testcases using specified metrics.Runs generate Records (individual test executions) and Scores (evaluation results) for each Record that help you understand your system’s performance across different scenarios.
Using Scorecard’s GitHub integration, you can trigger runs automatically (e.g. on pull request or a schedule).With the integration set up, you can also trigger runs of your real system from the Scorecard UI.
Click Show Details on a run page to view or edit the run notes and view the system/prompt version the run was executed with.
Run details with notes and system configuration.
Run data includes potentially sensitive information from your Testsets and system outputs. Follow your organization’s data handling policies and avoid including PII or confidential data in test configurations.