Case Studies: Validation with Synthetic Respondents

Rather than rely on anecdotal success, Simsurveys validates its synthetic responses using quantitative metrics that compare simulated outputs to real-world survey data. These comparisons help us ensure that our synthetic respondents produce distributions, patterns, and insights that are statistically equivalent to those obtained through traditional sampling.

Statistical Tests Used

We apply a standard battery of statistical measures to evaluate alignment between real and synthetic survey responses:

Metric	Purpose	Interpretation	Threshold for Alignment
Kullback-Leibler (KL) Divergence	Measures how much one probability distribution diverges from another	Lower is better (0 = identical distributions)	< 0.5
Jensen-Shannon Distance	Symmetrized version of KL divergence; bounded between 0–1	Closer to 0 = better fit	< 0.3
L1 Distance	Measures the absolute difference in category proportions	0 = perfect alignment	< 0.5
Pearson Correlation	Assesses linear agreement on ordinal or interval items	r > 0.9 = very strong alignment	> 0.9

Validation Summary

Below are results from recent validation exercises across key marketing survey domains. Each row compares synthetic panel outputs to matched real respondent data.

Test Domain	KL Divergence	JS Distance	L1 Distance	Pearson r	Interpretation
Brand Favorability	[TO BE FILLED]	[TO BE FILLED]	[TO BE FILLED]	[TO BE FILLED]	[Summary: e.g., "Excellent alignment"]
Net Promoter Score (NPS)	[TO BE FILLED]	[TO BE FILLED]	[TO BE FILLED]	[TO BE FILLED]	[Summary]
Pricing Sensitivity	[TO BE FILLED]	[TO BE FILLED]	[TO BE FILLED]	[TO BE FILLED]	[Summary]
Segmentation/Drivers	[TO BE FILLED]	[TO BE FILLED]	[TO BE FILLED]	[TO BE FILLED]	[Summary]

Why These Metrics?

Unlike anecdotal comparisons or cherry-picked examples, these statistical tests allow us to assess the fidelity of synthetic data in a repeatable, quantitative, and domain-independent way. By using these standard benchmarks, we ensure that our synthetic panels provide valid results across multiple use cases.

These metrics are commonly used in synthetic data validation across healthcare, financial modeling, and survey science (Snoke et al., 2018; Nowok et al., 2016).

Next Steps

We will publish full validation reports for key industry verticals as additional data becomes available. Meanwhile, these metrics will continue to guide how we calibrate and improve our models to match real-world survey behavior.

Want to validate your own survey against synthetic respondents? Contact us for a custom benchmark.

Validated Results Across Industries

Case Studies: Validation with Synthetic Respondents

Statistical Tests Used

Validation Summary

Why These Metrics?

Next Steps

Ready to Validate Your Research Approach?