Design Scenarios
A strong simulation starts with strong scenario design. Your goal is to represent real user conversations and define clear expectations for each step.
What a Scenario Contains
A conversation simulation scenario includes:
- Scenario Name
- Category
- One or more Conversation Steps
- For each step:
- User Message
- Expected Response
- Evaluation Criteria
The more realistic and measurable your scenario, the more useful your results.
Step-by-Step: Build a Scenario
1) Open Conversation Simulation
- Go to Simulation -> Conversation.
- Click New Conversation.
2) Add conversation steps
- In Conversation Steps, define the user message for Step 1.
- Fill the expected AI response for Step 1.
- Add more steps with Add Message to simulate a real multi-turn conversation.
3) Generate and refine criteria
- For each step, click Suggest/Rewrite Criteria.
- Review generated criteria carefully.
- Add manual criteria if needed.
- Rewrite unclear criteria until they are specific.
4) Complete conversation info
- Click Save in create mode.
- The system can suggest a scenario name automatically.
- Confirm or edit the name.
- Select the category.
- Confirm to create the scenario.
Criteria Writing Rules
Good criteria are testable. Avoid vague wording.
| Weak criterion | Strong criterion |
|---|---|
| "Answer should be good" | "Must mention refund period is 14 days" |
| "Should be polite" | "Must include an apology when service failure is mentioned" |
| "Explain clearly" | "Must provide 3 required steps in order" |
Use this pattern:
- Include required facts
- Include required policy boundaries
- Include prohibited behavior
Scenario Design Best Practices
- Start from real conversation logs and support FAQs.
- Keep one primary objective per scenario.
- Separate happy-path and edge-case scenarios.
- Include at least one escalation-trigger scenario if your workflow needs handoff.
- Add model-sensitive scenarios (long reasoning, policy nuance, multilingual requests) for model-switch testing.
Ready for Execution
Once scenarios are ready, run and monitor them: