> ## Documentation Index > Fetch the complete documentation index at: https://docs.scorecard.io/llms.txt > Use this file to discover all available pages before exploring further. # Multi-turn simulation > Test conversational AI agents with realistic multi-turn simulations and automated user personas. export const DarkLightImage = ({lightSrc, caption, alt, darkSrc = null, width = "1000"}) => { const getAbsoluteUrl = src => { if (src.startsWith('http://') || src.startsWith('https://')) { return src; } const currentUrl = typeof window !== 'undefined' ? window.location.origin : ''; if (currentUrl.includes('.mintlify.app')) { const subdomain = currentUrl.split('.')[0].replace('https://', ''); return `https://mintlify.s3.us-west-1.amazonaws.com/${subdomain}${src.startsWith('/') ? '' : '/'}${src}`; } else if (currentUrl === 'https://docs.scorecard.io') { return `https://mintlify.s3.us-west-1.amazonaws.com/scorecard-d65b5e8a${src.startsWith('/') ? '' : '/'}${src}`; } else { return `${currentUrl}${src.startsWith('/') ? '' : '/'}${src}`; } }; const content = <> {alt}

; if (caption) { return {content}; } else { return content; } }; Scorecard's multi-turn simulation feature allows you to evaluate the performance of conversational AI agents in realistic multi-turn conversations. Define instructions for your simulated user in Scorecard, then run a simulation on your agent using the Scorecard SDK. ## Create a Sim Agent Sim Agents are configurable AI personas that interact with your agent during testing. Each Sim Agent has a prompt template, model settings, and can be versioned for reproducibility. Go to the Sim Agents page, then click "New Sim Agent". Fill in the instructions for the Sim Agent then click "Save Sim Agent". We recommend using the model `Gpt-4.1` for better simulated behavior. You can create a Sim Agent through the API using the `scorecard.systems.upsert()` method. We recommend using the model `gpt-4.1` for better simulated behavior. ```python theme={null} from scorecard_ai import Scorecard from scorecard_ai.lib import ChatMessage sim_agent_prompt = """ You are an angry customer talking to a customer service agent. You want to return a {{item_to_return}} and will not be satisfied until the return is accepted. Never mention you're a simulation or help the agent. """ scorecard = Scorecard() sim_agent = scorecard.systems.upsert( project_id="203", name="Angry Amazon customer", config={ "modelName": "gpt-4.1", "promptTemplate": [ ChatMessage(role="system", content=sim_agent_prompt) ], "temperature": 0.1, "maxTokens": 1024, "topP": 0.9, "isSimAgent": True, }, ) SIM_AGENT_ID = sim_agent.id ``` ### Prompt template Sim Agent prompts support Jinja2 templating to inject testcase inputs dynamically: > You are an angry customer talking to customer service. > > Product: \{\{product\_name}}\ > Issue: \{\{customer\_complaint}}\ > Previous interactions: \{\{interaction\_count}} > > You want to \{\{customer\_goal}} and will not be satisfied until resolved.\ > Never mention you're a simulation or help the agent. Variables like \{\{product\_name}} are replaced with values from each testcase's input fields. See [Prompts](/features/prompts#advanced-jinja-features) for information on referencing testcase inputs and using Jinja in your Sim Agent prompts. ## Run a simulation Multi-turn simulations execute conversations between your AI agent and Sim Agents, capturing the full interaction for evaluation. Simulation runs are kicked off by calling `multi_turn_simulation()` from the Scorecard Python SDK. ### System function The `system` parameter is your agent code under test. It must be a callable that handles conversation turns. **Function signature:** ```python theme={null} def your_system( # Complete conversation history chat_history: list[ChatMessage], # Input fields from the current testcase testcase_inputs: dict[str, Any] # Returns a list of assistant messages as strings. # This will likely be a list containing a single string message. ) -> Iterable[str | ChatMessage] ``` Here's an example of an AI agent under test: ```python wrap expandable Example agent function theme={null} from openai import OpenAI from scorecard_ai.lib import ChatMessage system_prompt = """ You are a customer support agent for Amazon. Help the customer and remain polite and very concise. Try to figure out what the customer's needs are first, then continue by providing information or links to actions as appropriate following realistic Amazon guidelines. """ def customer_service_system( chat_history: list[ChatMessage], testcase_inputs: dict, ) -> list[str]: client = OpenAI() system_response = client.chat.completions.create( model="gpt-4.1", messages=chat_history, ).choices[0].message.content # Return a list containing the response content return [system_response] # Alternatively, return ChatMessage objects for more control. # This is useful if you want to track tool calls, which will be ignored by the simulated user agent. # return [ChatMessage(role="assistant", content=system_response)] ``` ### Initial messages The `initial_messages` parameter seeds the conversation before simulation begins. It can be: * A list of `ChatMessage` objects (used for all testcases) * A function that takes testcase inputs and returns messages (for dynamic initialization) * Omitted (starts with an empty conversation) ```python Static initial messages theme={null} # Option 1: Static initial messages for all testcases initial_messages: list[ChatMessage] = [ # System messages are ignored by the simulated user agent ChatMessage(role="system", content=system_prompt), # Pre-seed with an assistant greeting ChatMessage( role="assistant", content="Hello, how can I help you today?", ), ] ``` ```python Dynamic initial messages theme={null} # Option 2: Dynamic initial messages based on testcase inputs def get_initial_messages(testcase_inputs: dict) -> list[ChatMessage]: product_name = testcase_inputs.get("product_name", "our products") return [ ChatMessage(role="system", content=system_prompt), # Pre-seed with an assistant greeting specific to the testcase ChatMessage( role="assistant", content=f"Hello! I'm here to help with {product_name}.", ), ] ``` ### Running the simulation Use `multi_turn_simulation()` to run the simulation across all testcases in a testset: ```python theme={null} from scorecard_ai import Scorecard from scorecard_ai.lib import ChatMessage, StopChecks, multi_turn_simulation scorecard = Scorecard() simulation_run = multi_turn_simulation( client=scorecard, project_id=PROJECT_ID, # e.g. "123" metric_ids=METRIC_IDS, # e.g. ["123", "456"] testset_id=TESTSET_ID, # e.g. "456" sim_agent_id=SIM_AGENT_ID, # e.g. "abcdefgh-1234-5678-90ab-cdefgh01" system=customer_service_system, initial_messages=initial_messages, # Or use get_initial_messages for dynamic initialization stop_check=StopChecks.max_turns(5), # Optional: control the conversation stopping condition start_with_system=False, # Optional: explicitly control who starts the conversation ) ``` The simulation automatically determines who starts the conversation based on `initial_messages`. If the last message is from the user, the agent responds first. Use `start_with_system` to override this behavior. ## Stop checks Stop checks control when conversations end. They are functions that receive a `ConversationInfo` object and return `True` to stop the simulation. By default, the simulation runs for 5 "turns", where a turn is the number of times the `system` function was called. To prevent accidental infinite loops, simulations have a hard limit of 50 turns. If this limit is reached, the simulation ends automatically regardless of the stop check. ### Built-in stop checks Scorecard provides a few heuristic stop checks: | **Stop Check** | Description | Example usage | | :------------- | :--------------------------------------------------- | :--------------------------------------------- | | **Max turns** | Stop after `n` conversation turns | `StopChecks.max_turns(10)` | | **Content** | Stop when any phrase appears
(case-insensitive) | `StopChecks.content(["goodbye", "thank you"])` | | **Max time** | Stop after elapsed time (seconds) | `StopChecks.max_time(30.0)` | **Combine stop checks** You can create complex stopping conditions by combining stop checks using `StopChecks.any()` and `StopChecks.all()`. For example, `StopChecks.any([StopChecks.max_turns(5), StopChecks.max_time(10)])` will end the simulation after at most 5 turns or after at most 10 seconds. ### Custom stop checks For more advanced use cases, you can also define your own stop check. For example, this stop check ends the simulation when the user is satisfied with the conversation. ```python wrap expandable Custom stop check for user satisfaction theme={null} from scorecard_ai.lib import ConversationInfo def stop_check_user_is_satisfied(conversation_info: ConversationInfo) -> bool: if conversation_info["turn_count"] < 1: return False last_message = conversation_info["messages"][-1] if last_message["role"] != "user": return False # Evaluate the user's satisfaction with the conversation client = OpenAI() response = openai.responses.create( model="gpt-4.1-mini", instructions="You are given a conversation between a customer and an Amazon customer service agent. Determine if the customer is satisfied with the conversation. Say 'yes' if the conversation is over and the customer is satisfied, 'no' otherwise.", input=last_message["content"], ) return response.output_text.lower().contains("yes") ``` ## Viewing simulation results Calling `multi_turn_simulation()` creates a [Record](/features/records). To view simulation results, go to the Record's details page. The conversation history between the Sim Agent ("User") and the agent under test ("Assistant") will be shown. ## Common patterns Escalation paths are common in customer service systems. This Sim Agent will gradually escalate the conversation until the user is satisfied. > You are a customer seeking help with `{{issue}}`.\ > Start polite, but escalate if not satisfied:\ > 1\. First request: Be polite\ > 2\. Second request: Show frustration\ > 3\. Third request: Ask for supervisor\ > 4\. Fourth request: Threaten to cancel service This Sim Agent will test the agent's handling of unusual requests. > You are testing the agent's handling of `{{test_scenario}}`. > Try unusual requests like: > \- Very long product names > \- Special characters in input > \- Multiple issues at once > \- Contradictory requests This Sim Agent will start by asking about the main issue, then suddenly change topic to a distraction topic after 2 turns. It will then return to the original issue. > Start by asking about `{{main_issue}}`. > After 2 turns, suddenly change topic to `{{distraction_topic}}`. > Then return to the original issue. > Test if the assistant can handle context switches.