Generic AI personas repeat the same cultural clichés. The ethnography pipeline is what prevents that in Boses. Before generating any persona, Boses crawls public sources in your target market, extracts structured consumer signals using an LLM, and stores a cultural context snapshot. That snapshot is injected into every persona’s generation prompt.
The result is personas that carry real attitudes — specific brands they trust or distrust, real digital habits, current concerns — rather than assumed ones.
Why it matters
Imagine you are testing a new buy-now-pay-later product in the Philippines. A persona grounded in current signals will know that GCash dominates mobile payments, that certain banks are distrusted by younger consumers, and that there is active discourse on Reddit about hidden fees in credit products. A generic persona will not. That difference determines whether your simulation surfaces insights you can act on.
Data sources
The pipeline pulls from three public sources per market:
| Source | Markets | What it captures |
|---|
| Reddit | r/Philippines, r/indonesia, r/VietNam | Current consumer discourse — trending concerns, product sentiment, cultural moments as they emerge in community discussion |
| Shopee reviews | PH, ID, VN | Real purchase opinions from active shoppers: what they love, what disappoints them, price sensitivity, and brand comparisons at the moment of transaction |
| Google Play Store | GCash (PH), Gojek (ID), MoMo (VN) | Digital service trust, UX frustrations, and attitudes toward fintech and super-app ecosystems — the dominant digital infrastructure of each market |
Supported markets are PH (Philippines), ID (Indonesia), and VN (Vietnam). Snapshots are shared across all projects within the same market. If your company has five active projects targeting the Philippines, they all draw from the same PH snapshot.
Raw content from all sources passes through an LLM that extracts structured signals. Each snapshot stores the following signal types:
| Signal | Description |
|---|
trusted_brands | Brands that consumers speak positively about in current discourse |
distrusted_brands | Brands with active negative sentiment — distrust, disappointment, or avoidance |
digital_habits | How people use apps, payment methods, social platforms, and delivery services day to day |
price_sensitivity_signals | Expressions of value-consciousness, bargain-hunting behaviour, or aspirational spending patterns |
cultural_moments | Current events, trends, or shared experiences shaping consumer mood and priorities right now |
These signals become part of every persona’s generation context. A persona generated from a PH snapshot will reflect the specific brands and digital behaviours that Filipinos are talking about today — not a generalised summary of Filipino consumer behaviour.
Quality gate
Each snapshot receives a quality_score between 0.0 and 1.0 based on the volume and consistency of extracted signals. Only snapshots scoring above 0.5 are activated. If all data sources are unavailable during a crawl, the pipeline will not overwrite a healthy existing snapshot. Your personas continue to use the last valid data rather than degraded output.
Auto-refresh
You do not need to manage snapshot freshness manually. The pipeline triggers automatically in two situations:
- When you create a new persona group for a market, Boses queues a background refresh alongside generation.
- Staleness check — if the active snapshot for your target market is older than 7 days at the time of persona group creation, a refresh runs before generation proceeds.
The fresher the snapshot, the more current the consumer signals baked into your personas. If you are running research on a time-sensitive topic — a product launch, a cultural event, a competitor move — create a new persona group to trigger a fresh crawl rather than reusing one generated weeks earlier.
Vertical targeting
By default, the pipeline crawls general top posts and bestseller categories. If your research targets a specific industry, you can request a vertical-focused snapshot by contacting your account team. Supported verticals include: fintech, beauty, food delivery, telco, gaming — or any category relevant to your research. A vertical snapshot concentrates signals on the topic area, so personas generated from it carry more specific attitudes and brand awareness for that space.
If you are running research on a specific industry vertical and want personas that reflect current discourse in that category, reach out to your account team to request a targeted refresh before generating personas.