Knowsis Logo
KNOWSIS
Validation Framework

PRISM – Synthetic Data You Can Trust

Independent, auditable validation methodology mapped to Gartner's five criteria for synthetic data quality. Published methodology – not just claimed accuracy.

Why Methodology Matters

Most synthetic data vendors claim accuracy figures without publishing their methodology. This creates trust gaps that undermine analytical confidence.

Unverifiable Claims

"12x accuracy" headlines with no published methodology or validation criteria. How was this measured? Against what benchmark? These questions remain unanswered.

Self-Reported Metrics

Vendors grading their own work using proprietary, undisclosed scoring systems. No independent audit trail means no accountability.

Black-Box Projection

Data generated without transparency into how statistical properties are preserved, validated, or verified against source distributions.

Missing Predictive Testing

Accuracy claimed but never demonstrated against real-world outcomes. Synthetic data should predict as well as original data – this must be proven, not assumed.

Five Dimensions, One Composite Score

Each dimension is scored 0–100. The composite PRISM Quality Score is a weighted average. Above 80 indicates excellent quality; 60–80 is good; below 60 requires parameter adjustment or additional source data.

P

Precision

Measures how closely the synthetic variable distributions match the original source data. Checks that the generated data reproduces the statistical shape of every variable – not just the mean.

Distribution Match, Subgroup Alignment, Correlation Preservation, Jensen-Shannon Divergence
R

Richness

Measures whether the synthetic data preserves the diversity and range of the original. A high Richness score means the projection captures the full spread of responses – not a compressed or smoothed version.

Entropy Preservation, Coverage, Novelty, NN Diversity
I

Integrity

Checks whether relationships between variables are maintained. Synthetic data with high Integrity is internally coherent – the logical patterns between variables that exist in the real data are preserved in the projection or imputation.

Valid Levels, Range Validity, Ordinal Logic, Cross-Consistency, Missing Patterns
S

Strength

Measures the statistical confidence and analytical utility of the synthetic data. A high Strength score means the output provides genuine analytical power – not just structural similarity to the source.

Record Utility, ESS Efficiency, TSTR Ratio, Diminishing Returns
M

Modelability

Tests whether the synthetic data would produce similar analytical results if used for modelling or segmentation. This is the most practically important dimension – it answers the question: can I actually use this data for analysis?

Sample Adequacy, Missingness, Variable Quality, Hook Coverage, Signal Strength

Validated, auditable, governed. PRISM methodology – because claimed accuracy without published methodology is not validation.

Quality Score Interpretation

Every synthetic projection receives a PRISM quality grade based on composite validation metrics.

Excellent 80–100 Full analytical use – high confidence
Good 60–79 Most analytical applications
Review Required Below 60 Adjust parameters or increase source sample size

How PRISM Compares

Industry claims vs. the PRISM standard for synthetic data validation.

Industry Claim PRISM Standard
"12x accuracy" – no methodology published Published validation criteria, auditable trail
Self-reported accuracy metrics Mapped to Gartner's 5 validation requirements
Black-box projection process Representativeness + distributional fidelity checks
Accuracy claimed, not demonstrated Predictive validity tested against real outcomes
Single headline accuracy figure Five-dimension quality assessment (PRISM)

PRISM in Practice

How the validation framework operates across the synthetic data lifecycle.

1 Pre-Generation Validation

  • Source data quality assessment – sample size, coverage, missing data checks
  • Target population parameter verification against census or panel benchmarks
  • Variable-level distribution analysis to establish baseline metrics
  • Correlation structure mapping for multivariate integrity preservation

2 Post-Generation Audit

  • Distributional similarity testing across all synthetic variables
  • Segment-level plausibility checks – logical patterns within demographics
  • Predictive model equivalence testing (synthetic vs. original training)
  • PRISM quality grade assignment with full audit documentation

Ready for Validated Synthetic Data?

Every Knowsis synthetic projection includes full PRISM validation documentation – not just an accuracy claim, but an auditable methodology.