One of the most important and underappreciated concepts in statistics: a result can be statistically significant without being practically meaningful. Understanding this distinction is crucial for making good decisions with data.

The Problem with p-values Alone

Statistical significance (p < 0.05) only tells you that an effect is unlikely to be zero. It says nothing about how large the effect is. With a large enough sample, you can get p < 0.001 for an effect so tiny it has no real-world meaning.

A Concrete Example

A diet company tests a new supplement with n = 100,000 participants. After 3 months:

The result is highly statistically significant. But is losing 0.2 kg more meaningful? That is 200 grams — less than the weight of a glass of water. Practically, this supplement is useless. Yet it would produce headlines: "Supplement significantly reduces weight loss, study shows."

Effect Sizes — Measuring Practical Significance

Effect sizes quantify how large an effect is, independent of sample size. Always report effect sizes alongside p-values.

Cohen's d (for means)

d = (x̄₁ − x̄₂) / s_pooled

d valueInterpretationOverlap between groups
d = 0.2Small effect85% overlap
d = 0.5Medium effect67% overlap
d = 0.8Large effect53% overlap
d = 1.2Very large effect38% overlap

R² for Regression

R² = proportion of variance explained. R² = 0.01 is small, 0.09 is medium, 0.25 is large.

Cramér's V for Chi-Square

V = 0.10 small, 0.30 medium, 0.50 large.

Why Large Samples Inflate Significance

The standard error SE = σ/√n decreases as n increases. With n = 1,000,000, even a difference of 0.001 units produces a huge t-statistic. Every true effect, no matter how trivial, becomes statistically significant with enough data.

This is why modern journals require effect sizes, not just p-values. The question is not just "is there an effect?" but "is the effect large enough to matter?"

Practical Significance in Different Fields

Use our free statistics calculators and always pair your p-value with an effect size. Effect size can be computed from most hypothesis test outputs.