Docs/Test Automation/Analyze and Improve

Analyze and Improve

Execution is only half the process. The real value of simulation comes from analysis and targeted improvements.


Understand Criteria Evaluation

In simulation result detail, each conversation step is evaluated against your criteria.

You will see outcomes such as:

  • Matched: the response satisfied the criterion
  • Contradicted: the response conflicts with the criterion intent
  • Unmentioned: the response did not address required points

Use these outcomes to identify exactly where quality breaks.


Step-by-Step: Analyze a Result

1) Open a result detail

  1. Go to Simulation -> Results.
  2. Open a specific execution.
  3. Expand scenario and step details.

2) Inspect criteria outcomes

  1. Review match ratio and step-level outcomes.
  2. Read excerpts and reason notes where available.
  3. Identify repeated failure patterns.

3) Prioritize fixes

Prioritize by user risk:

  1. Compliance or policy errors
  2. Incorrect factual guidance
  3. Missing key decision information
  4. Tone and style inconsistencies

4) Apply targeted improvements

Based on failure type, update:

  • Persona behavior instructions
  • Dialog settings
  • Resource coverage and quality
  • Tool or plugin instructions
  • Scenario criteria quality itself

5) Re-run and verify

  1. Re-run affected scenarios.
  2. Confirm criteria outcomes improve.
  3. Track progress over multiple runs.

Model Switch Readiness Checklist

Use this checklist before publishing a new model:

CheckPass condition
Baseline run existsSame scenario set already executed on old model
Critical scenarios stableNo critical regression in key categories
Criteria quality preservedMatch quality remains acceptable across high-risk steps
Failure review completedEvery contradiction/unmentioned on critical steps is reviewed
Action plan documentedAny remaining gaps have owner and timeline

If one of these is not met, keep the new model in draft and continue iteration.


Continuous Improvement Loop

Treat simulation as an ongoing loop:

  1. Design scenario.
  2. Run simulation.
  3. Analyze outcomes.
  4. Improve configuration.
  5. Re-run for verification.

This loop keeps conversation quality strong even as your model, resources, and automation grow over time.