When you click “Create pull request”, a PR will be created in your repository with the following files:
.github/workflows/scorecard-eval.yml is the GitHub Actions workflow file that will be used to run the tests.
You can update your testing parameters and triggering conditions here. For example, use environment variables to add any API keys you need to run your system.
scorecard-eval.yml
Copy
Ask AI
name: Scorecard Evaluation Workflowon:workflow_dispatch: inputs: project_id: description: Project ID required: true testset_id: description: Testset ID required: true metric_ids: description: Metric IDs required: true system_version_id: description: System Version ID required: falserepository_dispatch: types: start-evaluationpermissions:contents: readjobs:evaluation-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: "3.12" - name: Install dependencies run: | python -m pip install --upgrade pip pip install -r requirements.txt - name: Run test env: SCORECARD_API_KEY: ${{ secrets.SCORECARD_API_KEY }} # Scorecard config IDs # 1. Check if there's an input from manual trigger (workflow_dispatch) # 2. Fallback to values sent from external sources (repository_dispatch) # 3. Fallback to default values PROJECT_ID: ${{ github.event.inputs.project_id || github.event.client_payload.project_id || env.DEFAULT_PROJECT_ID }} TESTSET_ID: ${{ github.event.inputs.testset_id || github.event.client_payload.testset_id || env.DEFAULT_TESTSET_ID }} METRIC_IDS: ${{ github.event.inputs.metric_ids || github.event.client_payload.metric_ids || env.DEFAULT_METRIC_IDS }} SYSTEM_VERSION_ID: ${{ github.event.inputs.system_version_id || github.event.client_payload.system_version_id || env.DEFAULT_SYSTEM_VERSION_ID }} run: python3 run_tests.py
.github/workflows/scorecard-eval.yml is the GitHub Actions workflow file that will be used to run the tests.
You can update your testing parameters and triggering conditions here. For example, use environment variables to add any API keys you need to run your system.
scorecard-eval.yml
Copy
Ask AI
name: Scorecard Evaluation Workflowon:workflow_dispatch: inputs: project_id: description: Project ID required: true testset_id: description: Testset ID required: true metric_ids: description: Metric IDs required: true system_version_id: description: System Version ID required: falserepository_dispatch: types: start-evaluationpermissions:contents: readjobs:evaluation-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - uses: actions/setup-python@v5 with: python-version: "3.12" - name: Install dependencies run: | python -m pip install --upgrade pip pip install -r requirements.txt - name: Run test env: SCORECARD_API_KEY: ${{ secrets.SCORECARD_API_KEY }} # Scorecard config IDs # 1. Check if there's an input from manual trigger (workflow_dispatch) # 2. Fallback to values sent from external sources (repository_dispatch) # 3. Fallback to default values PROJECT_ID: ${{ github.event.inputs.project_id || github.event.client_payload.project_id || env.DEFAULT_PROJECT_ID }} TESTSET_ID: ${{ github.event.inputs.testset_id || github.event.client_payload.testset_id || env.DEFAULT_TESTSET_ID }} METRIC_IDS: ${{ github.event.inputs.metric_ids || github.event.client_payload.metric_ids || env.DEFAULT_METRIC_IDS }} SYSTEM_VERSION_ID: ${{ github.event.inputs.system_version_id || github.event.client_payload.system_version_id || env.DEFAULT_SYSTEM_VERSION_ID }} run: python3 run_tests.py
This is the file that scorecard-eval.yml uses to run your system.
You should update the run_system() function to call your actual system.
run_tests.py
Copy
Ask AI
import osimport refrom typing import Anyfrom scorecard_ai import Scorecardfrom scorecard_ai.lib import run_and_evaluatedef run_system( system_input: dict[str, Any], system_config: dict[str, Any] | None = None) -> dict: """ FIXME: Replace this placeholder function with a call to your model """ return { "response": f"Placeholder LLM response, got input: {system_input}", }def main( *, scorecard_api_key: str, project_id: str, testset_id: str, metric_ids: list[str], system_version_id: str | None = None,) -> None: """ Run and score all Testcases in a given Testset """ client = Scorecard(api_key=scorecard_api_key) run = run_and_evaluate( client=client, project_id=project_id, testset_id=testset_id, metric_ids=metric_ids, **({"system_version_id": system_version_id} if system_version_id else {}), system=run_system, ) print(run["url"])if __name__ == "__main__": main( scorecard_api_key=os.environ["SCORECARD_API_KEY"], project_id=os.environ["PROJECT_ID"], testset_id=os.environ["TESTSET_ID"], metric_ids=re.findall(r"\b\d+\b", os.environ["METRIC_IDS"]), system_version_id=os.environ["SYSTEM_VERSION_ID"] or None, )
This file contains the dependencies for your system.