Analyze and Improve
Execution is only half the process. The real value of simulation comes from analysis and targeted improvements.
Understand Criteria Evaluation
In simulation result detail, each conversation step is evaluated against your criteria.
You will see outcomes such as:
- Matched: the response satisfied the criterion
- Contradicted: the response conflicts with the criterion intent
- Unmentioned: the response did not address required points
Use these outcomes to identify exactly where quality breaks.
Step-by-Step: Analyze a Result
1) Open a result detail
- Go to Simulation -> Results.
- Open a specific execution.
- Expand scenario and step details.
2) Inspect criteria outcomes
- Review match ratio and step-level outcomes.
- Read excerpts and reason notes where available.
- Identify repeated failure patterns.
3) Prioritize fixes
Prioritize by user risk:
- Compliance or policy errors
- Incorrect factual guidance
- Missing key decision information
- Tone and style inconsistencies
4) Apply targeted improvements
Based on failure type, update:
- Persona behavior instructions
- Dialog settings
- Resource coverage and quality
- Tool or plugin instructions
- Scenario criteria quality itself
5) Re-run and verify
- Re-run affected scenarios.
- Confirm criteria outcomes improve.
- Track progress over multiple runs.
Model Switch Readiness Checklist
Use this checklist before publishing a new model:
| Check | Pass condition |
|---|---|
| Baseline run exists | Same scenario set already executed on old model |
| Critical scenarios stable | No critical regression in key categories |
| Criteria quality preserved | Match quality remains acceptable across high-risk steps |
| Failure review completed | Every contradiction/unmentioned on critical steps is reviewed |
| Action plan documented | Any remaining gaps have owner and timeline |
If one of these is not met, keep the new model in draft and continue iteration.
Continuous Improvement Loop
Treat simulation as an ongoing loop:
- Design scenario.
- Run simulation.
- Analyze outcomes.
- Improve configuration.
- Re-run for verification.
This loop keeps conversation quality strong even as your model, resources, and automation grow over time.