Skip to main content
Running a simulation once gives you a directional read. But before you present findings to stakeholders or base a business decision on them, you want to know: are these results stable? Boses provides two complementary tools for this — cross-simulation convergence and per-simulation reliability checks.

Cross-simulation convergence

Convergence measures how consistently different simulations on the same persona group agree with each other. If you run three concept tests against the same group (perhaps testing different briefings, or re-running the same one), convergence tells you whether the group is producing stable, reproducible opinions.

How it works

Convergence compares pairs of completed simulations. For each pair, it computes three metrics:
MetricWeightWhat it measures
Direction match50%Whether the dominant sentiment (Positive/Neutral/Negative) is the same across both runs.
Distribution similarity30%How closely the full sentiment distributions match, using Jensen-Shannon divergence.
Theme overlap20%How many top themes appear in both simulations, using word-level similarity scoring.
These are combined into a pairwise score, and all pairwise scores are averaged into an overall convergence score between 0 and 1.

Interpreting the convergence score

ScoreInterpretation
≥ 0.75Strong — Results are highly consistent. You can present these findings with confidence.
≥ 0.50Moderate — Results show a consistent direction with some variation. Note areas of divergence when reporting.
< 0.50Weak — Results are inconsistent. Run additional simulations or review your persona group definition and briefing before acting on findings.

Fetching convergence scores

Pass a persona_group_id as a query parameter. Optionally filter by briefing_id to compare only simulations that used the same stimulus.
curl "https://api.temujintechnologies.com/api/v1/projects/<PROJECT_ID>/simulations/convergence?persona_group_id=<GROUP_ID>" \
  -H "Authorization: Bearer <access_token>"
To compare simulations that all used the same briefing:
curl "https://api.temujintechnologies.com/api/v1/projects/<PROJECT_ID>/simulations/convergence?persona_group_id=<GROUP_ID>&briefing_id=<BRIEFING_ID>" \
  -H "Authorization: Bearer <access_token>"
Example response:
{
  "persona_group_id": "grp_01hw2k5zxe3fqv7brc4d",
  "simulation_count": 3,
  "pairwise_scores": [
    {
      "simulation_a": "sim_01hw4n7zxe2fqv3brc5e",
      "simulation_b": "sim_01hw5p9zxe6grv8ctd7f",
      "score": 0.81,
      "direction_match": true,
      "distribution_similarity": 0.78,
      "theme_overlap": 0.74
    },
    {
      "simulation_a": "sim_01hw4n7zxe2fqv3brc5e",
      "simulation_b": "sim_01hw6q1zxe8hsw9due8g",
      "score": 0.76,
      "direction_match": true,
      "distribution_similarity": 0.71,
      "theme_overlap": 0.69
    },
    {
      "simulation_a": "sim_01hw5p9zxe6grv8ctd7f",
      "simulation_b": "sim_01hw6q1zxe8hsw9due8g",
      "score": 0.79,
      "direction_match": true,
      "distribution_similarity": 0.76,
      "theme_overlap": 0.72
    }
  ],
  "overall_convergence_score": 0.79,
  "interpretation": "strong"
}

Reliability checks

A reliability check goes one level deeper: it re-runs a specific simulation multiple times and measures how much the results vary across those repeat runs. This tells you whether a single simulation’s results are reproducible, independent of any other simulation.

How it works

When you kick off a reliability check, Boses runs N repeat executions of the same simulation against the same persona group. Once all runs complete, it computes a confidence score from three metrics:
MetricWeightWhat it measures
Sentiment agreement rate40%Fraction of repeat runs that produced the same dominant sentiment.
Distribution variance score35%How stable the full sentiment distribution is across runs (lower variance = higher score).
Theme overlap coefficient25%Fraction of theme tokens that appear consistently across at least 60% of runs.
The composite confidence_score is a weighted average between 0 and 1.

Starting a reliability check

curl -X POST https://api.temujintechnologies.com/api/v1/projects/<PROJECT_ID>/simulations/<SIM_ID>/reliability-check \
  -H "Authorization: Bearer <access_token>"
The check runs as a background task. Boses queues the repeat runs automatically.
Run a reliability check before presenting results to stakeholders. A confidence_score ≥ 0.75 is a strong indicator that the findings will hold up under scrutiny.

Retrieving the reliability check result

Poll GET /simulations/{id}/reliability-check until the check is complete:
curl https://api.temujintechnologies.com/api/v1/projects/<PROJECT_ID>/simulations/<SIM_ID>/reliability-check \
  -H "Authorization: Bearer <access_token>"
Example response:
{
  "simulation_id": "sim_01hw4n7zxe2fqv3brc5e",
  "status": "complete",
  "repeat_runs": 3,
  "confidence_score": 0.82,
  "metrics": {
    "sentiment_agreement_rate": 0.89,
    "distribution_variance_score": 0.77,
    "theme_overlap_coefficient": 0.73
  },
  "interpretation": "strong"
}

When to use each tool

ScenarioUse
You’ve run the same concept test multiple times and want to know if the group agrees with itself.Convergence (GET /simulations/convergence)
You want to validate a single simulation result before a stakeholder presentation.Reliability check (POST /simulations/{id}/reliability-check)
You’re comparing two different concepts against the same segment.Convergence across the two concept test simulations
You’re building confidence that a particular persona group produces stable results for a market.Convergence across multiple simulations on that group
Low convergence or confidence scores don’t necessarily mean something is wrong with the platform. They can indicate genuine ambivalence within the segment, a poorly scoped persona group, or a stimulus that is genuinely divisive. Treat weak scores as a signal to investigate further, not as noise to dismiss.