"Correlation does not imply causation" is one of the most important principles in statistics. Yet it is violated constantly in news articles, social media, and even published research. Understanding this distinction can save you from costly mistakes in business decisions, policy-making, and scientific conclusions.
What is Correlation?
Two variables are correlated if they tend to change together — when one goes up, the other goes up (positive correlation) or down (negative correlation). Measured by the Pearson correlation coefficient r ∈ [−1, +1].
What is Causation?
A causes B if changing A directly produces a change in B — not just that they happen to vary together. The direction matters (A → B), the timing matters (A must precede B), and alternative explanations must be ruled out.
Famous Spurious Correlations
These real correlations illustrate why correlation alone proves nothing:
- Ice cream sales and drowning rates (r ≈ 0.90) — both increase in summer. Hot weather causes both. Ice cream does not cause drowning.
- Nicolas Cage movies and pool drownings — both rose and fell together in the 1990s–2000s. Pure coincidence.
- Shoe size and reading ability in children — both increase with age. Age is the confound. Bigger feet do not cause better reading.
- Per capita cheese consumption and deaths from bedsheet tangling — statistically correlated over 10 years. No causal mechanism exists.
Why Variables Can Be Correlated Without Causation
1. Confounding Variable (Common Cause)
A third variable C causes both A and B. Example: Physical fitness level (C) causes both lower resting heart rate (A) and longer lifespan (B). Heart rate and lifespan are correlated, but one does not cause the other.
2. Reverse Causation
You think A causes B, but actually B causes A. Example: Depression and social isolation are correlated. Does isolation cause depression, or does depression cause isolation? Often both.
3. Coincidental Correlation
Pure chance, especially in small datasets or when you search through many pairs. With 100 variables, you expect about 5 spurious significant correlations at α = 0.05 by chance alone.
How to Establish Causation
The gold standard is a randomised controlled experiment (RCT):
- Randomly assign participants to treatment (A) or control (no A)
- Randomisation balances all confounders — known and unknown
- Any difference in outcomes can only be due to the treatment
When experiments are impossible (ethics, cost, scale), researchers use:
- Natural experiments: Policy changes, natural disasters that create quasi-random assignment
- Instrumental variables: A variable that affects A but not B except through A
- Difference-in-differences: Compare before/after changes between treated and untreated groups
- Bradford Hill criteria: Strength, consistency, temporality, biological plausibility
Practical Implications
Before acting on a correlation:
- Ask: Is there a plausible mechanism? (Why would A cause B?)
- Ask: Could a third variable cause both?
- Ask: Does A precede B in time?
- Ask: Has this been replicated?
- Ask: Is the effect consistent across subgroups?
Use our Pearson Correlation Calculator to measure correlation strength, and always pair it with critical thinking about causation.
Defining Correlation
Correlation measures the strength and direction of a linear relationship between two variables. The Pearson correlation coefficient r ranges from −1 to +1. A value of +1 means perfect positive linear relationship, −1 means perfect negative relationship, and 0 means no linear relationship. Correlation says nothing about why two variables move together — only that they do.
Correlation is symmetric: the correlation between X and Y equals the correlation between Y and X. It is also unitless, allowing comparison across different scales and measurement units.
Defining Causation
Causation means that changing one variable directly produces a change in another. Establishing causation requires more than observing that two variables tend to move together. The gold standard for establishing causation in science is the randomised controlled trial (RCT), where participants are randomly assigned to treatment and control groups, controlling for all confounding factors.
Classic Examples of Spurious Correlations
The internet has made spurious correlations famous. Tyler Vigen's database contains hundreds of absurd examples: US per capita cheese consumption correlates almost perfectly (r = 0.947) with number of people who died tangled in their bedsheets. Nicholas Cage movie releases correlate with swimming pool drownings. Ice cream sales correlate with violent crime rates.
These are not causal — they share a common cause (summer/season), are coincidental time series, or result from data mining thousands of variable pairs until some correlate by chance.
The Four Explanations for Correlation
When you observe a correlation between A and B, there are exactly four possible explanations:
- A causes B: Smoking causes lung cancer
- B causes A (reverse causation): Depression causes social isolation, or social isolation causes depression?
- A third variable C causes both A and B (confounding): Poverty causes both poor diet and poor health outcomes
- Chance: With enough variables tested, some will correlate by coincidence
Establishing Causation: Bradford Hill Criteria
In epidemiology, the Bradford Hill criteria provide a framework for judging whether a correlation is likely causal: strength of association, consistency across studies, specificity (the cause leads specifically to the effect), temporality (cause precedes effect), biological gradient (dose-response relationship), plausibility (biological mechanism exists), coherence with known facts, experimental evidence, and analogy with similar known causal relationships.
No single criterion is decisive, but satisfying more criteria strengthens the causal case. Smoking and lung cancer satisfied nearly all criteria, overcoming initial resistance to the causal claim.
Confounding Variables in Practice
Confounding is the most common reason correlations are misinterpreted as causal. A confounder is a variable that is associated with both the exposure and the outcome, distorting the apparent relationship between them.
Classic example: countries with more televisions per household have lower infant mortality rates. Does television save babies? No — both are driven by wealth. Wealthier countries have both better healthcare (reducing infant mortality) and more televisions. Television is correlated with mortality but does not cause the reduction.
Statistical Methods for Causal Inference
When randomised experiments are impossible (due to cost, ethics, or logistics), statistical methods attempt to estimate causal effects from observational data. These include regression adjustment (controlling for measured confounders), propensity score matching (creating balanced comparison groups), instrumental variables (using a variable that affects exposure but not outcome directly), difference-in-differences (comparing changes over time between groups), and regression discontinuity design (exploiting arbitrary thresholds).
These methods rest on assumptions that cannot be fully verified from data alone, which is why causal claims from observational studies should always be interpreted cautiously.
Defining Correlation
Correlation measures the strength and direction of a linear relationship between two variables. The Pearson correlation coefficient r ranges from −1 to +1. A value of +1 means perfect positive linear relationship, −1 means perfect negative relationship, and 0 means no linear relationship. Correlation says nothing about why two variables move together — only that they do.
Correlation is symmetric: the correlation between X and Y equals the correlation between Y and X. It is also unitless, allowing comparison across different scales and measurement units.
Defining Causation
Causation means that changing one variable directly produces a change in another. Establishing causation requires more than observing that two variables tend to move together. The gold standard for establishing causation in science is the randomised controlled trial (RCT), where participants are randomly assigned to treatment and control groups, controlling for all confounding factors.
Classic Examples of Spurious Correlations
The internet has made spurious correlations famous. Tyler Vigen's database contains hundreds of absurd examples: US per capita cheese consumption correlates almost perfectly (r = 0.947) with number of people who died tangled in their bedsheets. Nicholas Cage movie releases correlate with swimming pool drownings. Ice cream sales correlate with violent crime rates.
These are not causal — they share a common cause (summer/season), are coincidental time series, or result from data mining thousands of variable pairs until some correlate by chance.
The Four Explanations for Correlation
When you observe a correlation between A and B, there are exactly four possible explanations:
- A causes B: Smoking causes lung cancer
- B causes A (reverse causation): Depression causes social isolation, or social isolation causes depression?
- A third variable C causes both A and B (confounding): Poverty causes both poor diet and poor health outcomes
- Chance: With enough variables tested, some will correlate by coincidence
Establishing Causation: Bradford Hill Criteria
In epidemiology, the Bradford Hill criteria provide a framework for judging whether a correlation is likely causal: strength of association, consistency across studies, specificity (the cause leads specifically to the effect), temporality (cause precedes effect), biological gradient (dose-response relationship), plausibility (biological mechanism exists), coherence with known facts, experimental evidence, and analogy with similar known causal relationships.
No single criterion is decisive, but satisfying more criteria strengthens the causal case. Smoking and lung cancer satisfied nearly all criteria, overcoming initial resistance to the causal claim.
Confounding Variables in Practice
Confounding is the most common reason correlations are misinterpreted as causal. A confounder is a variable that is associated with both the exposure and the outcome, distorting the apparent relationship between them.
Classic example: countries with more televisions per household have lower infant mortality rates. Does television save babies? No — both are driven by wealth. Wealthier countries have both better healthcare (reducing infant mortality) and more televisions. Television is correlated with mortality but does not cause the reduction.
Statistical Methods for Causal Inference
When randomised experiments are impossible (due to cost, ethics, or logistics), statistical methods attempt to estimate causal effects from observational data. These include regression adjustment (controlling for measured confounders), propensity score matching (creating balanced comparison groups), instrumental variables (using a variable that affects exposure but not outcome directly), difference-in-differences (comparing changes over time between groups), and regression discontinuity design (exploiting arbitrary thresholds).
These methods rest on assumptions that cannot be fully verified from data alone, which is why causal claims from observational studies should always be interpreted cautiously.
Worked Example: Dissecting a Spurious Correlation
A data analyst at a city council notices a strong positive correlation (r = 0.78) between the number of ice cream shops in a neighbourhood and the crime rate. Should the council ban ice cream shops to reduce crime?
Clearly not. Both variables share a common cause: population density and temperature. Denser, warmer neighbourhoods have more ice cream shops AND more opportunities for crime. This is a classic confounding example. To test this: if you control for population density and average summer temperature in a multiple regression and the ice cream coefficient drops to near zero, the confounders explain the correlation. If the coefficient remains strong even after controlling for plausible confounders, the relationship deserves more investigation — but still does not establish causation.
The formal test: regress crime on ice cream shops, population density, and temperature. Ice cream coefficient becomes −0.02 (p=0.78, not significant). The correlation was entirely explained by confounders. This statistical adjustment is the observational study's main tool for separating correlation from causation, though unmeasured confounders always remain a concern.
Natural Experiments: Finding Causation in Observational Data
Sometimes nature provides quasi-random assignment that mimics a randomised experiment. These "natural experiments" allow causal inference from observational data. Classic examples: the effect of education on earnings exploited compulsory schooling laws (students born just after a school entry cutoff got one more year of education than those born just before — as-if random assignment). The effect of military service on lifetime earnings used draft lottery numbers (randomly assigned). These instrumental variable approaches extract causal information from observational variation that approximates randomness, providing some of the strongest observational evidence for causation outside formal RCTs.
Calculate Instantly — 100% Free
45 statistics calculators with step-by-step solutions, interactive charts, and PDF export. No sign-up needed.
▶ Open Free Statistics Calculator
Deep Dive: Correlation Vs Causation — Theory, Assumptions, and Best Practices
This section provides a comprehensive look at the Correlation Vs Causation — covering the mathematical theory, step-by-step worked examples, complete assumptions checking, effect size reporting, common mistakes, and real-world applications that go beyond introductory coverage.
Mathematical Foundation
Every statistical procedure rests on a mathematical model of how data is generated. The Correlation Vs Causation assumes specific data-generating conditions that, when satisfied, guarantee the stated Type I error rate and power. Understanding these foundations helps you know when results are trustworthy and when to seek alternatives.
Assumptions and Diagnostics
Before interpreting any result, verify all assumptions are satisfied. Common assumption violations and their remedies:
- Non-normality: For small samples, use non-parametric alternatives or bootstrap methods. For large samples, the Central Limit Theorem typically provides robustness.
- Outliers: Identify using IQR fence or modified z-scores. Investigate each outlier — correct data errors, but do not delete genuine extreme observations without disclosure.
- Independence violations: Clustered or longitudinal data requires mixed models or GEE rather than standard methods assuming independence.
Interpreting Your Results Completely
A complete interpretation always includes: (1) the test statistic value, (2) degrees of freedom, (3) exact p-value, (4) confidence interval for the parameter of interest, (5) effect size with interpretation, and (6) a plain-language conclusion. Never report just a p-value — it communicates only one dimension of a multi-dimensional result.
Effect Size and Practical Significance
Statistical significance tells you that an effect is detectable; effect size tells you whether it matters. For every test, compute and report the appropriate effect size measure alongside the p-value. Use field-specific benchmarks (not just Cohen's generic small/medium/large) to evaluate practical significance.
Common Errors and How to Avoid Them
- Multiple testing without correction: Apply Bonferroni, Holm, or FDR corrections whenever running more than one test on the same dataset.
- Confusing statistical and practical significance: Always ask "is this large enough to matter?" not just "is this detectable?"
- p-hacking: Pre-register hypotheses, analysis plans, and significance thresholds before seeing data.
- Overlooking assumptions: Verify independence, normality (or large n), and homogeneity of variance before applying parametric tests.
When This Test Is Not Appropriate
Every test has boundaries of appropriate application. Understand when to use non-parametric alternatives, when to switch to more complex models, and when the research question requires a different analytic framework entirely. Using the wrong test produces incorrect Type I error rates and power — even if the computation is done correctly.
Reporting in Academic and Professional Contexts
Follow APA 7th edition reporting format for academic publications: report the test statistic with its symbol (t, F, χ², z), degrees of freedom in parentheses, exact p-value to two or three decimal places, and confidence intervals. Example: "A one-sample t-test indicated that study time significantly exceeded the 10-hour benchmark, t(23) = 2.84, p = .009, d = 0.58, 95% CI [10.7, 13.2]."