> ## Documentation Index > Fetch the complete documentation index at: https://docs.scorecard.io/llms.txt > Use this file to discover all available pages before exploring further. # MCP Server Quickstart > Build testsets and metrics conversationally with Claude Desktop and the Scorecard MCP server export const DarkLightImage = ({lightSrc, caption, alt, darkSrc = null, width = "1000"}) => { const getAbsoluteUrl = src => { if (src.startsWith('http://') || src.startsWith('https://')) { return src; } const currentUrl = typeof window !== 'undefined' ? window.location.origin : ''; if (currentUrl.includes('.mintlify.app')) { const subdomain = currentUrl.split('.')[0].replace('https://', ''); return `https://mintlify.s3.us-west-1.amazonaws.com/${subdomain}${src.startsWith('/') ? '' : '/'}${src}`; } else if (currentUrl === 'https://docs.scorecard.io') { return `https://mintlify.s3.us-west-1.amazonaws.com/scorecard-d65b5e8a${src.startsWith('/') ? '' : '/'}${src}`; } else { return `${currentUrl}${src.startsWith('/') ? '' : '/'}${src}`; } }; const content = <> {alt}

; if (caption) { return {content}; } else { return content; } }; Use the Scorecard MCP (Model Context Protocol) server to create evaluation testsets and metrics through natural language in Claude. Instead of writing code or clicking through UIs, just tell Claude what you need and it will use the Scorecard API to set everything up. This quickstart uses Claude Desktop, but it also works with other MCP clients, like Cursor and Claude Code. ## Steps Create a [Scorecard account](https://app.scorecard.io/dashboard) if you don't have one already. Open Claude and add the Scorecard MCP server using the remote configuration: 1. Open Claude settings 2. Navigate to the "MCP Servers" section 3. Add a new remote server with URL: `https://mcp.scorecard.io/mcp` 4. Complete the OAuth authentication flow to connect your Scorecard account The remote MCP server requires no local dependencies or API key management. Authentication happens securely through your browser. Once connected, Claude will have access to all Scorecard API capabilities through natural language. Simply ask Claude to create a new Scorecard project: > Create a new Scorecard project called "Customer Support Bot Evaluation" for testing my AI customer support assistant. Claude will automatically use the appropriate MCP tools (like `create_projects`, `list_projects`) based on your natural language request. Now create a testset to hold your evaluation test cases. Describe the structure you need: > Create a testset called "Support Scenarios" in this project. The testcases should have: > > * Input fields: "customerMessage" (the customer's question) and "category" (support category like billing, technical, or product) > * Expected output field: "idealResponse" (what a great response from the agent looks like) > * Metadata field: "difficulty" (easy, medium, or hard) > > Then add 5 testcases covering different support scenarios. Claude can usually guess which inputs and output fields you want, but it's better to tell it what your field names are. Define metrics to evaluate your AI system. Describe what "good" looks like: > Create two AI-scored metrics for this project: > > 1. "Response Accuracy" (integer) - Measures how well the response answers the customer's question compared to the ideal response > 2. "Tone Appropriateness" (boolean) - Checks if the response uses professional, empathetic language appropriate for customer support > > Use GPT-4o as the evaluator with temperature 0 for consistency. Claude's initial tool call to `create_metrics` failed because it used the wrong arguments, but it was able to eventually succeed by reading the documentation and trying again. Open your Scorecard project in the web UI to see everything that was created! Everything is now ready to score records. You can score records against your metrics through the UI, SDK, or continue using Claude with MCP. You can score records from the [Records page](/features/records), the [SDK](/intro/sdk-quickstart), or the [Scorecard playground](/features/playground). You can also continue the conversation to analyze and iterate on your metrics and scores. > Explain the latest scoring results for this project. > Update the Response Accuracy metric to be stricter about factual details. > Add 5 more testcases covering edge cases like angry customers and off-topic questions. The MCP server gives Claude access to the [full Scorecard API](/api-reference), so you can manage your entire evaluation workflow conversationally. ## Tips for Using the MCP Server **Be specific about data structures**: When creating testsets, clearly describe the field names, types, and which fields are inputs vs expected outputs. This helps Claude set up the schema correctly. **Describe evaluation criteria**: When creating metrics, explain what makes a "good" output in detail. Claude will translate this into effective evaluation guidelines. **Ask for recommendations**: Claude can suggest metrics, testcase scenarios, and evaluation strategies based on your use case. Just ask "What metrics should I use for evaluating a RAG system?" **Iterate conversationally**: Made a mistake? Just ask Claude to fix it: "Update that metric to use temperature 0.1 instead" or "Add a new field called 'priority' to the testset" ## Troubleshooting The Scorecard MCP server works with most MCP clients, including Claude Desktop, Cursor, and Claude Code. Make sure you've added the remote server URL correctly (`https://mcp.scorecard.io/mcp`) and completed the OAuth flow. See the [MCP Server documentation](/features/mcp) for more installation instructions. Make sure you've added the remote server URL correctly (`https://mcp.scorecard.io/mcp`) and completed the OAuth flow. Restart Claude if needed. The MCP server uses OAuth tokens that may expire. Try disconnecting and reconnecting the MCP server in Claude settings to refresh authentication. Verify the MCP server is connected and enabled in Claude settings. You should see "Scorecard" listed in your active MCP servers. For local installation with your Scorecard API key, see the [MCP Server documentation](/features/mcp#local-configuration). Use `npx -y scorecard-ai-mcp@latest` with environment variables. ## Learn More Ready to go deeper? Check out these resources: Complete guide to the Scorecard MCP server capabilities and architecture Deep dive into creating and managing evaluation metrics Learn about testset schemas, field mappings, and organization