Running a simulation once gives you a directional read. But before you present findings to stakeholders or base a business decision on them, you want to know: are these results stable? Boses provides two complementary tools for this — cross-simulation convergence and per-simulation reliability checks.
Cross-simulation convergence
Convergence measures how consistently different simulations on the same persona group agree with each other. If you run three concept tests against the same group (perhaps testing different briefings, or re-running the same one), convergence tells you whether the group is producing stable, reproducible opinions.
How it works
Convergence compares pairs of completed simulations. For each pair, it computes three metrics:
| Metric | Weight | What it measures |
|---|
| Direction match | 50% | Whether the dominant sentiment (Positive/Neutral/Negative) is the same across both runs. |
| Distribution similarity | 30% | How closely the full sentiment distributions match, using Jensen-Shannon divergence. |
| Theme overlap | 20% | How many top themes appear in both simulations, using word-level similarity scoring. |
These are combined into a pairwise score, and all pairwise scores are averaged into an overall convergence score between 0 and 1.
Interpreting the convergence score
| Score | Interpretation |
|---|
| ≥ 0.75 | Strong — Results are highly consistent. You can present these findings with confidence. |
| ≥ 0.50 | Moderate — Results show a consistent direction with some variation. Note areas of divergence when reporting. |
| < 0.50 | Weak — Results are inconsistent. Run additional simulations or review your persona group definition and briefing before acting on findings. |
Fetching convergence scores
Pass a persona_group_id as a query parameter. Optionally filter by briefing_id to compare only simulations that used the same stimulus.
curl "https://api.temujintechnologies.com/api/v1/projects/<PROJECT_ID>/simulations/convergence?persona_group_id=<GROUP_ID>" \
-H "Authorization: Bearer <access_token>"
To compare simulations that all used the same briefing:
curl "https://api.temujintechnologies.com/api/v1/projects/<PROJECT_ID>/simulations/convergence?persona_group_id=<GROUP_ID>&briefing_id=<BRIEFING_ID>" \
-H "Authorization: Bearer <access_token>"
Example response:
{
"persona_group_id": "grp_01hw2k5zxe3fqv7brc4d",
"simulation_count": 3,
"pairwise_scores": [
{
"simulation_a": "sim_01hw4n7zxe2fqv3brc5e",
"simulation_b": "sim_01hw5p9zxe6grv8ctd7f",
"score": 0.81,
"direction_match": true,
"distribution_similarity": 0.78,
"theme_overlap": 0.74
},
{
"simulation_a": "sim_01hw4n7zxe2fqv3brc5e",
"simulation_b": "sim_01hw6q1zxe8hsw9due8g",
"score": 0.76,
"direction_match": true,
"distribution_similarity": 0.71,
"theme_overlap": 0.69
},
{
"simulation_a": "sim_01hw5p9zxe6grv8ctd7f",
"simulation_b": "sim_01hw6q1zxe8hsw9due8g",
"score": 0.79,
"direction_match": true,
"distribution_similarity": 0.76,
"theme_overlap": 0.72
}
],
"overall_convergence_score": 0.79,
"interpretation": "strong"
}
Reliability checks
A reliability check goes one level deeper: it re-runs a specific simulation multiple times and measures how much the results vary across those repeat runs. This tells you whether a single simulation’s results are reproducible, independent of any other simulation.
How it works
When you kick off a reliability check, Boses runs N repeat executions of the same simulation against the same persona group. Once all runs complete, it computes a confidence score from three metrics:
| Metric | Weight | What it measures |
|---|
| Sentiment agreement rate | 40% | Fraction of repeat runs that produced the same dominant sentiment. |
| Distribution variance score | 35% | How stable the full sentiment distribution is across runs (lower variance = higher score). |
| Theme overlap coefficient | 25% | Fraction of theme tokens that appear consistently across at least 60% of runs. |
The composite confidence_score is a weighted average between 0 and 1.
Starting a reliability check
curl -X POST https://api.temujintechnologies.com/api/v1/projects/<PROJECT_ID>/simulations/<SIM_ID>/reliability-check \
-H "Authorization: Bearer <access_token>"
The check runs as a background task. Boses queues the repeat runs automatically.
Run a reliability check before presenting results to stakeholders. A confidence_score ≥ 0.75 is a strong indicator that the findings will hold up under scrutiny.
Retrieving the reliability check result
Poll GET /simulations/{id}/reliability-check until the check is complete:
curl https://api.temujintechnologies.com/api/v1/projects/<PROJECT_ID>/simulations/<SIM_ID>/reliability-check \
-H "Authorization: Bearer <access_token>"
Example response:
{
"simulation_id": "sim_01hw4n7zxe2fqv3brc5e",
"status": "complete",
"repeat_runs": 3,
"confidence_score": 0.82,
"metrics": {
"sentiment_agreement_rate": 0.89,
"distribution_variance_score": 0.77,
"theme_overlap_coefficient": 0.73
},
"interpretation": "strong"
}
| Scenario | Use |
|---|
| You’ve run the same concept test multiple times and want to know if the group agrees with itself. | Convergence (GET /simulations/convergence) |
| You want to validate a single simulation result before a stakeholder presentation. | Reliability check (POST /simulations/{id}/reliability-check) |
| You’re comparing two different concepts against the same segment. | Convergence across the two concept test simulations |
| You’re building confidence that a particular persona group produces stable results for a market. | Convergence across multiple simulations on that group |
Low convergence or confidence scores don’t necessarily mean something is wrong with the platform. They can indicate genuine ambivalence within the segment, a poorly scoped persona group, or a stimulus that is genuinely divisive. Treat weak scores as a signal to investigate further, not as noise to dismiss.