Correlation vs Causation — Key Differences & Real Examples

"Correlation does not imply causation" is one of the most important principles in statistics. Yet it is violated constantly in news articles, social media, and even published research. Understanding this distinction can save you from costly mistakes in business decisions, policy-making, and scientific conclusions.

What is Correlation?

Two variables are correlated if they tend to change together — when one goes up, the other goes up (positive correlation) or down (negative correlation). Measured by the Pearson correlation coefficient r ∈ [−1, +1].

What is Causation?

A causes B if changing A directly produces a change in B — not just that they happen to vary together. The direction matters (A → B), the timing matters (A must precede B), and alternative explanations must be ruled out.

Famous Spurious Correlations

These real correlations illustrate why correlation alone proves nothing:

Ice cream sales and drowning rates (r ≈ 0.90) — both increase in summer. Hot weather causes both. Ice cream does not cause drowning.
Nicolas Cage movies and pool drownings — both rose and fell together in the 1990s–2000s. Pure coincidence.
Shoe size and reading ability in children — both increase with age. Age is the confound. Bigger feet do not cause better reading.
Per capita cheese consumption and deaths from bedsheet tangling — statistically correlated over 10 years. No causal mechanism exists.

Why Variables Can Be Correlated Without Causation

1. Confounding Variable (Common Cause)

A third variable C causes both A and B. Example: Physical fitness level (C) causes both lower resting heart rate (A) and longer lifespan (B). Heart rate and lifespan are correlated, but one does not cause the other.

2. Reverse Causation

You think A causes B, but actually B causes A. Example: Depression and social isolation are correlated. Does isolation cause depression, or does depression cause isolation? Often both.

3. Coincidental Correlation

Pure chance, especially in small datasets or when you search through many pairs. With 100 variables, you expect about 5 spurious significant correlations at α = 0.05 by chance alone.

How to Establish Causation

The gold standard is a randomised controlled experiment (RCT):

Randomly assign participants to treatment (A) or control (no A)
Randomisation balances all confounders — known and unknown
Any difference in outcomes can only be due to the treatment

When experiments are impossible (ethics, cost, scale), researchers use:

Natural experiments: Policy changes, natural disasters that create quasi-random assignment
Instrumental variables: A variable that affects A but not B except through A
Difference-in-differences: Compare before/after changes between treated and untreated groups
Bradford Hill criteria: Strength, consistency, temporality, biological plausibility

Practical Implications

Before acting on a correlation:

Ask: Is there a plausible mechanism? (Why would A cause B?)
Ask: Could a third variable cause both?
Ask: Does A precede B in time?
Ask: Has this been replicated?
Ask: Is the effect consistent across subgroups?

Use our Pearson Correlation Calculator to measure correlation strength, and always pair it with critical thinking about causation.

Defining Correlation

Correlation measures the strength and direction of a linear relationship between two variables. The Pearson correlation coefficient r ranges from −1 to +1. A value of +1 means perfect positive linear relationship, −1 means perfect negative relationship, and 0 means no linear relationship. Correlation says nothing about why two variables move together — only that they do.

Correlation is symmetric: the correlation between X and Y equals the correlation between Y and X. It is also unitless, allowing comparison across different scales and measurement units.

Defining Causation

Causation means that changing one variable directly produces a change in another. Establishing causation requires more than observing that two variables tend to move together. The gold standard for establishing causation in science is the randomised controlled trial (RCT), where participants are randomly assigned to treatment and control groups, controlling for all confounding factors.

Classic Examples of Spurious Correlations

The internet has made spurious correlations famous. Tyler Vigen's database contains hundreds of absurd examples: US per capita cheese consumption correlates almost perfectly (r = 0.947) with number of people who died tangled in their bedsheets. Nicholas Cage movie releases correlate with swimming pool drownings. Ice cream sales correlate with violent crime rates.

These are not causal — they share a common cause (summer/season), are coincidental time series, or result from data mining thousands of variable pairs until some correlate by chance.

The Four Explanations for Correlation

When you observe a correlation between A and B, there are exactly four possible explanations:

A causes B: Smoking causes lung cancer
B causes A (reverse causation): Depression causes social isolation, or social isolation causes depression?
A third variable C causes both A and B (confounding): Poverty causes both poor diet and poor health outcomes
Chance: With enough variables tested, some will correlate by coincidence

Establishing Causation: Bradford Hill Criteria

In epidemiology, the Bradford Hill criteria provide a framework for judging whether a correlation is likely causal: strength of association, consistency across studies, specificity (the cause leads specifically to the effect), temporality (cause precedes effect), biological gradient (dose-response relationship), plausibility (biological mechanism exists), coherence with known facts, experimental evidence, and analogy with similar known causal relationships.

No single criterion is decisive, but satisfying more criteria strengthens the causal case. Smoking and lung cancer satisfied nearly all criteria, overcoming initial resistance to the causal claim.

Confounding Variables in Practice

Confounding is the most common reason correlations are misinterpreted as causal. A confounder is a variable that is associated with both the exposure and the outcome, distorting the apparent relationship between them.

Classic example: countries with more televisions per household have lower infant mortality rates. Does television save babies? No — both are driven by wealth. Wealthier countries have both better healthcare (reducing infant mortality) and more televisions. Television is correlated with mortality but does not cause the reduction.

Statistical Methods for Causal Inference

When randomised experiments are impossible (due to cost, ethics, or logistics), statistical methods attempt to estimate causal effects from observational data. These include regression adjustment (controlling for measured confounders), propensity score matching (creating balanced comparison groups), instrumental variables (using a variable that affects exposure but not outcome directly), difference-in-differences (comparing changes over time between groups), and regression discontinuity design (exploiting arbitrary thresholds).

These methods rest on assumptions that cannot be fully verified from data alone, which is why causal claims from observational studies should always be interpreted cautiously.

Defining Correlation

Correlation is symmetric: the correlation between X and Y equals the correlation between Y and X. It is also unitless, allowing comparison across different scales and measurement units.

Defining Causation

Classic Examples of Spurious Correlations

These are not causal — they share a common cause (summer/season), are coincidental time series, or result from data mining thousands of variable pairs until some correlate by chance.

The Four Explanations for Correlation

When you observe a correlation between A and B, there are exactly four possible explanations:

A causes B: Smoking causes lung cancer
B causes A (reverse causation): Depression causes social isolation, or social isolation causes depression?
A third variable C causes both A and B (confounding): Poverty causes both poor diet and poor health outcomes
Chance: With enough variables tested, some will correlate by coincidence

Establishing Causation: Bradford Hill Criteria

No single criterion is decisive, but satisfying more criteria strengthens the causal case. Smoking and lung cancer satisfied nearly all criteria, overcoming initial resistance to the causal claim.

Confounding Variables in Practice

Statistical Methods for Causal Inference

These methods rest on assumptions that cannot be fully verified from data alone, which is why causal claims from observational studies should always be interpreted cautiously.

Worked Example: Dissecting a Spurious Correlation

A data analyst at a city council notices a strong positive correlation (r = 0.78) between the number of ice cream shops in a neighbourhood and the crime rate. Should the council ban ice cream shops to reduce crime?

Clearly not. Both variables share a common cause: population density and temperature. Denser, warmer neighbourhoods have more ice cream shops AND more opportunities for crime. This is a classic confounding example. To test this: if you control for population density and average summer temperature in a multiple regression and the ice cream coefficient drops to near zero, the confounders explain the correlation. If the coefficient remains strong even after controlling for plausible confounders, the relationship deserves more investigation — but still does not establish causation.

The formal test: regress crime on ice cream shops, population density, and temperature. Ice cream coefficient becomes −0.02 (p=0.78, not significant). The correlation was entirely explained by confounders. This statistical adjustment is the observational study's main tool for separating correlation from causation, though unmeasured confounders always remain a concern.

Natural Experiments: Finding Causation in Observational Data

Sometimes nature provides quasi-random assignment that mimics a randomised experiment. These "natural experiments" allow causal inference from observational data. Classic examples: the effect of education on earnings exploited compulsory schooling laws (students born just after a school entry cutoff got one more year of education than those born just before — as-if random assignment). The effect of military service on lifetime earnings used draft lottery numbers (randomly assigned). These instrumental variable approaches extract causal information from observational variation that approximates randomness, providing some of the strongest observational evidence for causation outside formal RCTs.

Calculate Instantly — 100% Free

45 statistics calculators with step-by-step solutions, interactive charts, and PDF export. No sign-up needed.

▶ Open Free Statistics Calculator

🔗 Related Resources

Statistical Conc Pearson Correlation Calculator → Statistical Conc Linear Regression Calculator → Statistical Conc Spearman Correlation Calculator → All Articles Browse All Statistics Articles →

Correlation vs Causation — Key Differences

What is Correlation?

What is Causation?

Famous Spurious Correlations

Why Variables Can Be Correlated Without Causation

1. Confounding Variable (Common Cause)

2. Reverse Causation

3. Coincidental Correlation

How to Establish Causation

Practical Implications

Defining Correlation

Defining Causation

Classic Examples of Spurious Correlations

The Four Explanations for Correlation

Establishing Causation: Bradford Hill Criteria

Confounding Variables in Practice

Statistical Methods for Causal Inference

Defining Correlation

Defining Causation

Classic Examples of Spurious Correlations

The Four Explanations for Correlation

Establishing Causation: Bradford Hill Criteria

Confounding Variables in Practice

Statistical Methods for Causal Inference

Worked Example: Dissecting a Spurious Correlation

Natural Experiments: Finding Causation in Observational Data

Calculate Instantly — 100% Free

Deep Dive: Correlation Vs Causation — Theory, Assumptions, and Best Practices

Mathematical Foundation

Assumptions and Diagnostics

Interpreting Your Results Completely

Effect Size and Practical Significance

Common Errors and How to Avoid Them

When This Test Is Not Appropriate

Reporting in Academic and Professional Contexts