Systems at a glance

A System brings together everything that shapes your AI’s behavior: prompts, model settings, and configuration. Instead of tweaking prompts in isolation, you version the full setup and test it as one unit.
Screenshot of the Systems list in the UI.Screenshot of the Systems list in the UI.

Systems list with latest configuration.

Why it matters

Systems keep prompts, parameters, and options together, give you versioned changes you can compare, and make it clear which version is in production. For teams moving beyond single prompts, this helps control prompt drift, preserves an audit trail of what’s live, and ensures you evaluate exactly what users experience.

If you call it something else

  • Prompt pipeline / agent configuration / workflow config: Systems capture the whole thing.
  • Where versions live: Each System has multiple versions; you can label one as production and compare to latest.
  • What’s captured: Prompts, model ids/params, tools or routing settings, and custom flags that change outputs.

How versions work

Each System has versioned configurations. You can mark any version as production and keep iterating on latest; if production isn’t set, the latest is used until you choose one. New versions are auto‑named (“Version N”), identical configs collapse to the existing version, and deleting from the UI archives the System safely for history.

Create a System

Click New System, give it a name and description, and paste configuration JSON for your app (whatever drives your model behavior, e.g., style, temperature, or other flags). Scorecard versions this for you.
Screenshot of creating a new System in the UI.Screenshot of creating a new System in the UI.

Create System modal.

Inspect a System

Open a System to view details, timestamps, and all versions. You can quickly scan configurations and see which version is latest or marked as production.
Screenshot of System details and versions in the UI.Screenshot of System details and versions in the UI.

System details with version history and configs.

Test the full setup

Systems work hand-in-hand with Testsets, Metrics, and Runs so you evaluate realistic changes—not just prompt text. Try it with our quickstart and see how different configurations impact results. → Try the Joke Bot Quickstart

Use cases

  • A/B prompts or model choices across the same testset.
  • Tune temperature, system messages, or tool settings and measure impact.
  • Promote the best-performing version to production, with rollback safety.
  • Track regressions across releases by re-running previous versions.