Synthetic Data Generation

What is Synthetic Data Generation?

Scorecard’s AI-powered testcase generation helps you quickly create realistic test data for your evaluations. Using advanced AI models, the system generates testcases that match your testset schema, drawing from optional examples you provide to ensure quality and consistency. This feature significantly reduces the manual effort required to build comprehensive evaluation datasets.

Generate testcases through Scorecard’s UI, with enterprise plans supporting larger batch generation. You can also import datasets from external tools.

How Generation Works

Scorecard uses advanced AI models to generate testcases that automatically match your testset’s JSON schema. The system supports few-shot learning by using existing testcases as examples, ensuring generated data follows established patterns and maintains quality consistency. You can optionally provide keywords or descriptions to guide generation toward specific topics or scenarios.

Generating Test Data

Navigate to your testset and select existing testcases to use as examples, then click Generate from the actions toolbar. Provide optional keywords or descriptions to guide generation, specify how many testcases to create, and review the generated results before adding them to your testset. The system automatically ensures all generated testcases match your testset’s schema and field requirements. For larger-scale synthetic data generation, you can create datasets using external tools and import them via CSV, JSON, or JSONL upload through Scorecard’s bulk import feature.

External Generation Tools

For large-scale synthetic data generation beyond Scorecard’s built-in capabilities, you can use external tools and import the results: Popular Options:

OpenAI API: Generate custom datasets programmatically
Anthropic Claude: Create diverse conversational test cases
Open source tools: Use libraries like Faker or custom scripts
Domain-specific generators: Industry-specific test data tools

Best Practices

Use existing testcases as examples whenever possible to guide generation quality and ensure consistency with your evaluation patterns. Start with small batches to validate output quality before generating larger sets. Provide specific keywords or descriptions to focus generation on relevant scenarios for your use case. Review generated testcases before adding them to your testset to maintain data quality and remove any irrelevant or problematic examples. For comprehensive test coverage, combine AI generation with manual testcase creation and external data imports.

Testsets

Learn more about managing test data in Scorecard

Metrics

Define metrics to evaluate generated test data quality

API Reference

Complete API documentation for data generation

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

What is Synthetic Data Generation?

How Generation Works

Generating Test Data

External Generation Tools

Best Practices

Testsets

Metrics

API Reference

Introduction

Quickstarts

Core features

Advanced features

Governance, Risk, and Compliance

​What is Synthetic Data Generation?

​How Generation Works

​Generating Test Data

​External Generation Tools

​Best Practices

​Related Resources

Testsets

Metrics

API Reference

What is Synthetic Data Generation?

How Generation Works

Generating Test Data

External Generation Tools

Best Practices

Related Resources