The normal distribution — also called the Gaussian distribution or bell curve — is the most important probability distribution in all of statistics. It appears everywhere in nature, science, and social data. Understanding it thoroughly is essential for anyone working with data.
What is the Normal Distribution?
The normal distribution is a continuous probability distribution that is perfectly symmetric around its mean. It is completely described by just two parameters: the mean (μ) and the standard deviation (σ).
Notation: X ~ N(μ, σ²) means "X follows a normal distribution with mean μ and variance σ²".
The standard normal distribution is N(0, 1) — mean = 0, SD = 1. Any normal distribution can be converted to it.
Key Properties of the Normal Distribution
- Symmetric: The left and right halves are mirror images. Mean = Median = Mode.
- Bell-shaped: The distribution peaks at the mean and decreases towards the tails.
- Asymptotic tails: The tails approach but never touch zero — extreme values are theoretically possible but increasingly unlikely.
- Defined by μ and σ: Different values of μ shift the curve left/right. Different values of σ make it wider or narrower.
- Total area = 1: The total area under the curve equals 1 (100% probability).
The Empirical Rule (68-95-99.7 Rule)
For any normal distribution:
| Range | Percentage of data | Real example (IQ: μ=100, σ=15) |
| μ ± 1σ | 68.27% | IQ between 85 and 115 |
| μ ± 2σ | 95.45% | IQ between 70 and 130 |
| μ ± 3σ | 99.73% | IQ between 55 and 145 |
Only 0.27% of data falls more than 3 standard deviations from the mean. This is why |z| > 3 is used to identify extreme outliers.
The Standard Normal Distribution
The standard normal distribution N(0, 1) is the normal distribution with mean = 0 and SD = 1. All probabilities for normal distributions are computed by converting to the standard normal using the z-score formula:
z = (x − μ) / σ
Once converted to a z-score, you look up the probability in a standard normal table or use a calculator.
Example: Heights of adult men follow N(175cm, 7cm²). What proportion are taller than 185cm?
z = (185 − 175) / 7 = 1.43. P(Z > 1.43) = 1 − 0.924 = 7.6% of men are taller than 185cm.
Why is the Normal Distribution So Common?
The Central Limit Theorem (CLT) explains why the normal distribution appears everywhere. The CLT states that the sum (or mean) of many independent random variables tends towards a normal distribution, regardless of the original distribution — as long as the sample size is large enough (usually n ≥ 30).
Examples of normally distributed variables:
- Human height and weight
- IQ scores (designed to follow N(100, 225))
- Measurement errors in instruments
- Blood pressure readings in a population
- Annual rainfall in a region
- Scores on standardised tests
When is Data NOT Normally Distributed?
Not all data is normal. These distributions are typically non-normal:
- Income and wealth: Right-skewed — a few very high values pull the mean up
- Response times: Right-skewed — can be very long but cannot be negative
- Number of events: Often Poisson distributed
- Proportions and probabilities: Bounded between 0 and 1 — use beta distribution
- Survival times: Often exponential or Weibull distributed
Before using statistical tests that assume normality (t-test, ANOVA, regression), check your data with a normality test.
Testing for Normality
Several methods check whether your data follows a normal distribution:
- Visual: Histogram (should be bell-shaped), Q-Q plot (points should follow a straight line)
- Skewness and kurtosis: Normal data has skewness ≈ 0 and excess kurtosis ≈ 0
- Formal tests: Shapiro-Wilk (best for small samples), Jarque-Bera, Kolmogorov-Smirnov
Use our free Normality Test Calculator and Normal Distribution Calculator for instant probability calculations.
Why the Normal Distribution is Everywhere
The normal distribution's ubiquity in nature is explained by the Central Limit Theorem (CLT), one of the most profound results in probability theory. The CLT states that the sum (or average) of a large number of independent random variables tends toward a normal distribution, regardless of the shape of the original distribution. This is why biological measurements, measurement errors, and many social phenomena approximate normality — they result from many small independent influences accumulating.
Height is a classic example. Human height is influenced by hundreds of genes, each with a small additive effect, plus environmental factors like nutrition. No single gene dominates. The CLT predicts — and data confirms — that adult heights follow a nearly perfect bell curve.
The Mathematical Definition
The probability density function (PDF) of the normal distribution is:
f(x) = (1 / (σ√2π)) × e^(−(x−μ)²/2σ²)
This formula looks complex but contains elegant logic. The term (x−μ)²/2σ² measures how many "units of variance" the value x is from the mean. The negative exponential means probability decreases as you move away from the mean. The constant 1/(σ√2π) normalises the function so total probability equals 1.
Properties of the Normal Distribution
The normal distribution has several key properties that make it mathematically tractable:
- Symmetry: The distribution is perfectly symmetric around its mean. Mean = Median = Mode.
- Defined by two parameters: Completely described by μ (location) and σ (spread)
- Asymptotic tails: Tails approach but never reach zero
- Bell-shaped: Single peak (unimodal) at the mean
- Stability: Linear combinations of normal variables are also normal
The Standard Normal Distribution and Z-Scores
The standard normal distribution has μ = 0 and σ = 1. Any normal distribution can be converted to standard normal by computing z-scores: z = (x − μ) / σ. This standardisation allows use of a single table (the z-table) for all normal distributions, regardless of their original mean and standard deviation.
Probability calculations then reduce to finding areas under the standard normal curve. For instance, P(X ≤ x) = P(Z ≤ z) = Φ(z), where Φ is the standard normal CDF. These values are tabulated and built into every statistical software package.
Testing for Normality
Before applying tests that assume normality (t-tests, ANOVA, linear regression), analysts typically check whether data is approximately normal. Common approaches include:
- Histogram inspection: Does the histogram look roughly bell-shaped?
- Q-Q plot: Points should fall approximately on a straight diagonal line
- Shapiro-Wilk test: Formal test; p > 0.05 suggests normality not rejected
- Jarque-Bera test: Tests whether skewness and kurtosis match normal distribution
- Kolmogorov-Smirnov test: Compares empirical and theoretical CDFs
When the Normal Approximation Fails
Not all data is normal, and applying normal-based methods to non-normal data can produce misleading results. Common departures from normality include:
- Right skew: Income, house prices, survival times — use log transformation or non-parametric methods
- Left skew: Age at retirement, exam scores in easy tests
- Heavy tails (fat tails): Financial returns, extreme weather events — use t-distribution or extreme value distributions
- Bimodal: Data from two distinct subpopulations mixed together
Real-World Applications
The normal distribution underpins much of modern statistical practice. In quality control, control charts assume normally distributed measurements to set 3-sigma limits. In psychometrics, IQ tests are scaled so scores follow N(100, 15²). In finance, the Black-Scholes option pricing model assumes log-normally distributed stock prices. In epidemiology, many biomarkers and drug responses are modelled as normally distributed within populations.
Why the Normal Distribution is Everywhere
The normal distribution's ubiquity in nature is explained by the Central Limit Theorem (CLT), one of the most profound results in probability theory. The CLT states that the sum (or average) of a large number of independent random variables tends toward a normal distribution, regardless of the shape of the original distribution. This is why biological measurements, measurement errors, and many social phenomena approximate normality — they result from many small independent influences accumulating.
Height is a classic example. Human height is influenced by hundreds of genes, each with a small additive effect, plus environmental factors like nutrition. No single gene dominates. The CLT predicts — and data confirms — that adult heights follow a nearly perfect bell curve.
The Mathematical Definition
The probability density function (PDF) of the normal distribution is:
f(x) = (1 / (σ√2π)) × e^(−(x−μ)²/2σ²)
This formula looks complex but contains elegant logic. The term (x−μ)²/2σ² measures how many "units of variance" the value x is from the mean. The negative exponential means probability decreases as you move away from the mean. The constant 1/(σ√2π) normalises the function so total probability equals 1.
Properties of the Normal Distribution
The normal distribution has several key properties that make it mathematically tractable:
- Symmetry: The distribution is perfectly symmetric around its mean. Mean = Median = Mode.
- Defined by two parameters: Completely described by μ (location) and σ (spread)
- Asymptotic tails: Tails approach but never reach zero
- Bell-shaped: Single peak (unimodal) at the mean
- Stability: Linear combinations of normal variables are also normal
The Standard Normal Distribution and Z-Scores
The standard normal distribution has μ = 0 and σ = 1. Any normal distribution can be converted to standard normal by computing z-scores: z = (x − μ) / σ. This standardisation allows use of a single table (the z-table) for all normal distributions, regardless of their original mean and standard deviation.
Probability calculations then reduce to finding areas under the standard normal curve. For instance, P(X ≤ x) = P(Z ≤ z) = Φ(z), where Φ is the standard normal CDF. These values are tabulated and built into every statistical software package.
Testing for Normality
Before applying tests that assume normality (t-tests, ANOVA, linear regression), analysts typically check whether data is approximately normal. Common approaches include:
- Histogram inspection: Does the histogram look roughly bell-shaped?
- Q-Q plot: Points should fall approximately on a straight diagonal line
- Shapiro-Wilk test: Formal test; p > 0.05 suggests normality not rejected
- Jarque-Bera test: Tests whether skewness and kurtosis match normal distribution
- Kolmogorov-Smirnov test: Compares empirical and theoretical CDFs
When the Normal Approximation Fails
Not all data is normal, and applying normal-based methods to non-normal data can produce misleading results. Common departures from normality include:
- Right skew: Income, house prices, survival times — use log transformation or non-parametric methods
- Left skew: Age at retirement, exam scores in easy tests
- Heavy tails (fat tails): Financial returns, extreme weather events — use t-distribution or extreme value distributions
- Bimodal: Data from two distinct subpopulations mixed together
Real-World Applications
The normal distribution underpins much of modern statistical practice. In quality control, control charts assume normally distributed measurements to set 3-sigma limits. In psychometrics, IQ tests are scaled so scores follow N(100, 15²). In finance, the Black-Scholes option pricing model assumes log-normally distributed stock prices. In epidemiology, many biomarkers and drug responses are modelled as normally distributed within populations.
Worked Probability Calculations: Step by Step
IQ scores follow N(100, 15²). Answer these questions:
Q1: What proportion of people have IQ above 130? z = (130−100)/15 = 2.0. P(Z > 2.0) = 1 − Φ(2.0) = 1 − 0.9772 = 0.0228. About 2.28% of people have IQ above 130 — roughly 1 in 44.
Q2: What IQ is at the 90th percentile? Find z where Φ(z) = 0.90. z = 1.282. IQ = 100 + 1.282×15 = 100 + 19.23 = 119.23. The 90th percentile IQ is approximately 119.
Q3: What proportion falls between 85 and 115? z₁ = (85−100)/15 = −1.0, z₂ = (115−100)/15 = 1.0. P(−1 < Z < 1) = Φ(1) − Φ(−1) = 0.8413 − 0.1587 = 0.6826. About 68.26% — confirming the empirical rule.
Q4: What is the probability two randomly selected people both have IQ above 130? P(both) = 0.0228² = 0.00052. Only 1 in 1,920 pairs — very rare indeed.
The Log-Normal Connection
If X follows a log-normal distribution, then ln(X) is normally distributed. This is important because many real-world quantities are log-normally distributed: income, city sizes, stock prices, biological measurements like cell sizes, and durations like hospital stays. Log transformation often converts right-skewed data into approximately normal data, enabling normal-based statistical methods. The geometric mean of log-normal data equals e^μ (where μ is the mean of the log-transformed values), which is always less than the arithmetic mean — an important consideration when averaging naturally skewed quantities.
Calculate Instantly — 100% Free
45 statistics calculators with step-by-step solutions, interactive charts, and PDF export. No sign-up needed.
▶ Open Free Statistics Calculator
Deep Dive: Normal Distribution Explained — Theory, Assumptions, and Best Practices
This section provides a comprehensive look at the Normal Distribution Explained — covering the mathematical theory, step-by-step worked examples, complete assumptions checking, effect size reporting, common mistakes, and real-world applications that go beyond introductory coverage.
Mathematical Foundation
Every statistical procedure rests on a mathematical model of how data is generated. The Normal Distribution Explained assumes specific data-generating conditions that, when satisfied, guarantee the stated Type I error rate and power. Understanding these foundations helps you know when results are trustworthy and when to seek alternatives.
Assumptions and Diagnostics
Before interpreting any result, verify all assumptions are satisfied. Common assumption violations and their remedies:
- Non-normality: For small samples, use non-parametric alternatives or bootstrap methods. For large samples, the Central Limit Theorem typically provides robustness.
- Outliers: Identify using IQR fence or modified z-scores. Investigate each outlier — correct data errors, but do not delete genuine extreme observations without disclosure.
- Independence violations: Clustered or longitudinal data requires mixed models or GEE rather than standard methods assuming independence.
Interpreting Your Results Completely
A complete interpretation always includes: (1) the test statistic value, (2) degrees of freedom, (3) exact p-value, (4) confidence interval for the parameter of interest, (5) effect size with interpretation, and (6) a plain-language conclusion. Never report just a p-value — it communicates only one dimension of a multi-dimensional result.
Effect Size and Practical Significance
Statistical significance tells you that an effect is detectable; effect size tells you whether it matters. For every test, compute and report the appropriate effect size measure alongside the p-value. Use field-specific benchmarks (not just Cohen's generic small/medium/large) to evaluate practical significance.
Common Errors and How to Avoid Them
- Multiple testing without correction: Apply Bonferroni, Holm, or FDR corrections whenever running more than one test on the same dataset.
- Confusing statistical and practical significance: Always ask "is this large enough to matter?" not just "is this detectable?"
- p-hacking: Pre-register hypotheses, analysis plans, and significance thresholds before seeing data.
- Overlooking assumptions: Verify independence, normality (or large n), and homogeneity of variance before applying parametric tests.
When This Test Is Not Appropriate
Every test has boundaries of appropriate application. Understand when to use non-parametric alternatives, when to switch to more complex models, and when the research question requires a different analytic framework entirely. Using the wrong test produces incorrect Type I error rates and power — even if the computation is done correctly.
Reporting in Academic and Professional Contexts
Follow APA 7th edition reporting format for academic publications: report the test statistic with its symbol (t, F, χ², z), degrees of freedom in parentheses, exact p-value to two or three decimal places, and confidence intervals. Example: "A one-sample t-test indicated that study time significantly exceeded the 10-hour benchmark, t(23) = 2.84, p = .009, d = 0.58, 95% CI [10.7, 13.2]."