A probability distribution describes how the values of a random variable are distributed. Choosing the right distribution for your data is crucial — using the wrong one leads to incorrect probabilities and invalid statistical tests.
Discrete vs Continuous Distributions
Discrete distributions: The variable takes countable values (0, 1, 2, ...). Example: number of heads in 10 flips.
Continuous distributions: The variable can take any value in an interval. Example: height, weight, temperature.
Key Discrete Distributions
Binomial Distribution B(n, p)
Counts the number of successes in n independent trials, each with probability p. P(X=k) = C(n,k) × pᵏ × (1−p)ⁿ⁻ᵏ. Mean = np, Variance = np(1−p).
Use when: Fixed n trials, binary outcome (success/failure), constant p, independent trials.
Examples: Number of heads in 20 coin flips, number of defective items in a batch of 50, number of patients who respond to treatment in a clinical trial of 100.
Poisson Distribution Poisson(λ)
Counts events occurring in a fixed interval when events happen at constant average rate λ. P(X=k) = (λᵏ × e⁻λ)/k!. Mean = Variance = λ.
Use when: Counting rare events in time/space, events are independent, constant average rate.
Examples: Number of calls to a call centre per hour, number of accidents per month at a junction, number of typos per page.
Hypergeometric Distribution
Like binomial but for sampling WITHOUT replacement from a finite population. Use when the sample is a significant fraction of the population (>5%).
Key Continuous Distributions
Normal Distribution N(μ, σ²)
The most important distribution in statistics. Bell-shaped, symmetric, defined by mean and variance. Foundation of the Central Limit Theorem and most parametric tests.
Examples: Height, IQ scores, measurement errors, exam scores.
T-Distribution t(df)
Like the normal but with heavier tails. Used when σ is unknown and sample size is small. As df → ∞, t → normal.
Use for: T-tests, confidence intervals when σ is unknown.
Chi-Square Distribution χ²(df)
Sum of squared standard normal variables. Always non-negative, right-skewed. Used in goodness-of-fit tests, tests of independence, and variance tests.
F-Distribution F(df₁, df₂)
Ratio of two chi-square distributions. Used in ANOVA, regression F-tests, and comparing variances.
Exponential Distribution Exp(λ)
Time between events in a Poisson process. Memoryless: P(T > t+s | T > s) = P(T > t). Mean = 1/λ.
Examples: Time between customer arrivals, time until machine failure, radioactive decay times.
Uniform Distribution U(a, b)
All values in [a, b] equally likely. Mean = (a+b)/2. Used in simulation and as a reference distribution.
Choosing the Right Distribution
| Data Type | Question | Distribution |
| Count of successes in n trials | How many heads in 20 flips? | Binomial |
| Count of rare events | Calls per hour? | Poisson |
| Continuous, symmetric | Heights, test scores? | Normal |
| Time until event | When does machine fail? | Exponential |
| Testing means (σ unknown) | T-test p-value? | T-distribution |
| Testing variances/ANOVA | F-test p-value? | F-distribution |
| Testing categorical data | Chi-square p-value? | Chi-square |
Calculate probabilities for all these distributions using our free statistics calculators.
What is a Probability Distribution?
A probability distribution specifies the probability of each possible outcome for a random variable. For discrete variables, the probability mass function (PMF) gives P(X = x) for each specific value. For continuous variables, the probability density function (PDF) describes relative likelihood, with probabilities computed as areas under the curve. The cumulative distribution function (CDF) gives P(X ≤ x) for any x.
Discrete vs Continuous Distributions
Discrete distributions model countable outcomes: number of defects per item, number of customers per hour, number of heads in coin flips. Continuous distributions model measurements that can take any value in a range: height, weight, temperature, time. The key difference is that discrete distributions assign positive probability to specific values, while continuous distributions assign zero probability to any single point (only intervals have positive probability).
The Binomial Distribution
The binomial distribution models the number of successes in n independent Bernoulli trials (binary outcomes) with constant success probability p. Mean = np, Variance = np(1−p). Applications include: quality control (number of defective items), medical trials (number of patients who recover), election polling (number supporting a candidate). As n increases with moderate p, the binomial approaches the normal distribution.
The Poisson Distribution
The Poisson distribution models the number of rare events occurring in a fixed interval of time or space, given a constant average rate λ. Mean = Variance = λ. It arises as the limit of the binomial when n is large and p is small (np = λ). Applications: call centre arrivals per hour, insurance claims per day, mutations per DNA strand, web server requests per second.
The Normal Distribution
The normal (Gaussian) distribution is the most important continuous distribution, arising from the Central Limit Theorem. It is symmetric, bell-shaped, and completely characterised by mean μ and standard deviation σ. Approximately 68%, 95%, and 99.7% of data lies within 1, 2, and 3 standard deviations of the mean. It is used as an approximation for many real-world phenomena and underlies most classical statistical inference.
The Exponential Distribution
The exponential distribution models the time between events in a Poisson process — the waiting time until the next event when events occur at a constant rate λ. Mean = 1/λ, Variance = 1/λ². Its key property is memorylessness: given that you have waited t minutes, the probability of waiting another s minutes is the same as starting fresh. Applications: service times, component lifetimes, time between network packets.
Choosing the Right Distribution
Selecting an appropriate distribution is a critical modelling decision. Consider: is the variable discrete or continuous? What are its natural boundaries (0 to 1? non-negative? unbounded)? Is it symmetric or skewed? What generating process could produce it? Model selection criteria like AIC (Akaike Information Criterion) and likelihood ratio tests help choose between competing distributions for a given dataset.
What is a Probability Distribution?
A probability distribution specifies the probability of each possible outcome for a random variable. For discrete variables, the probability mass function (PMF) gives P(X = x) for each specific value. For continuous variables, the probability density function (PDF) describes relative likelihood, with probabilities computed as areas under the curve. The cumulative distribution function (CDF) gives P(X ≤ x) for any x.
Discrete vs Continuous Distributions
Discrete distributions model countable outcomes: number of defects per item, number of customers per hour, number of heads in coin flips. Continuous distributions model measurements that can take any value in a range: height, weight, temperature, time. The key difference is that discrete distributions assign positive probability to specific values, while continuous distributions assign zero probability to any single point (only intervals have positive probability).
The Binomial Distribution
The binomial distribution models the number of successes in n independent Bernoulli trials (binary outcomes) with constant success probability p. Mean = np, Variance = np(1−p). Applications include: quality control (number of defective items), medical trials (number of patients who recover), election polling (number supporting a candidate). As n increases with moderate p, the binomial approaches the normal distribution.
The Poisson Distribution
The Poisson distribution models the number of rare events occurring in a fixed interval of time or space, given a constant average rate λ. Mean = Variance = λ. It arises as the limit of the binomial when n is large and p is small (np = λ). Applications: call centre arrivals per hour, insurance claims per day, mutations per DNA strand, web server requests per second.
The Normal Distribution
The normal (Gaussian) distribution is the most important continuous distribution, arising from the Central Limit Theorem. It is symmetric, bell-shaped, and completely characterised by mean μ and standard deviation σ. Approximately 68%, 95%, and 99.7% of data lies within 1, 2, and 3 standard deviations of the mean. It is used as an approximation for many real-world phenomena and underlies most classical statistical inference.
The Exponential Distribution
The exponential distribution models the time between events in a Poisson process — the waiting time until the next event when events occur at a constant rate λ. Mean = 1/λ, Variance = 1/λ². Its key property is memorylessness: given that you have waited t minutes, the probability of waiting another s minutes is the same as starting fresh. Applications: service times, component lifetimes, time between network packets.
Choosing the Right Distribution
Selecting an appropriate distribution is a critical modelling decision. Consider: is the variable discrete or continuous? What are its natural boundaries (0 to 1? non-negative? unbounded)? Is it symmetric or skewed? What generating process could produce it? Model selection criteria like AIC (Akaike Information Criterion) and likelihood ratio tests help choose between competing distributions for a given dataset.
The Beta and Gamma Distributions
Beyond the common distributions, the Beta distribution models probabilities and proportions — values bounded between 0 and 1. It is widely used in Bayesian statistics as a prior for probability parameters, and in A/B testing to model conversion rates. The Gamma distribution generalises the exponential distribution and models waiting times for multiple events. It is used for insurance claims, rainfall amounts, and as a conjugate prior in Bayesian analysis. Understanding these distributions expands your modelling toolkit considerably beyond the basic normal and binomial.
Worked Example: Choosing the Right Distribution
A call centre manager wants to model two different things. First: how many calls arrive per hour? Over a long period, calls arrive at an average rate of 15 per hour, independently of each other. This is a Poisson process — use the Poisson distribution with λ=15. P(exactly 20 calls in an hour) = e⁻¹⁵ × 15²⁰/20! ≈ 0.042. P(more than 25 calls) = 1 − Σᵢ₌₀²⁵ P(X=i) ≈ 0.013.
Second: how long does each call last? Call duration is always positive, right-skewed (most calls short, a few very long), and memoryless. The exponential distribution fits well. If mean duration is 4 minutes (λ=0.25/min), P(call lasts more than 8 minutes) = e⁻⁰·²⁵×⁸ = e⁻² ≈ 0.135. About 13.5% of calls exceed 8 minutes — useful for staffing decisions.
The Normal Approximation to the Binomial
When n is large and p is not too close to 0 or 1, the binomial can be approximated by a normal distribution: X ~ N(np, np(1−p)). For n=500 and p=0.4: mean = 200, variance = 120, SD ≈ 10.95. P(X ≤ 190) ≈ P(Z ≤ (190.5−200)/10.95) = P(Z ≤ −0.868) ≈ 0.193 using the continuity correction (+0.5). This approximation works when np ≥ 5 and n(1−p) ≥ 5. For small samples or extreme p, use the exact binomial. This approximation is the foundation of proportion z-tests used widely in polling and quality control.
Distribution Fitting in Practice
Given real data, how do you choose the right distribution? The process involves: plotting a histogram and noting the shape (symmetric? skewed? bounded?), computing skewness and kurtosis and matching to known distributions, using Q-Q plots against candidate distributions, and applying formal goodness-of-fit tests (Kolmogorov-Smirnov, Anderson-Darling). Software packages like R (fitdistrplus), Python (scipy.stats), and Minitab automate this process, but understanding the underlying logic helps you evaluate whether automated suggestions make subject-matter sense.
Calculate Instantly — 100% Free
45 statistics calculators with step-by-step solutions, interactive charts, and PDF export. No sign-up needed.
▶ Open Free Statistics Calculator
Deep Dive: Probability Distributions Guide — Theory, Assumptions, and Best Practices
This section provides a comprehensive look at the Probability Distributions Guide — covering the mathematical theory, step-by-step worked examples, complete assumptions checking, effect size reporting, common mistakes, and real-world applications that go beyond introductory coverage.
Mathematical Foundation
Every statistical procedure rests on a mathematical model of how data is generated. The Probability Distributions Guide assumes specific data-generating conditions that, when satisfied, guarantee the stated Type I error rate and power. Understanding these foundations helps you know when results are trustworthy and when to seek alternatives.
Assumptions and Diagnostics
Before interpreting any result, verify all assumptions are satisfied. Common assumption violations and their remedies:
- Non-normality: For small samples, use non-parametric alternatives or bootstrap methods. For large samples, the Central Limit Theorem typically provides robustness.
- Outliers: Identify using IQR fence or modified z-scores. Investigate each outlier — correct data errors, but do not delete genuine extreme observations without disclosure.
- Independence violations: Clustered or longitudinal data requires mixed models or GEE rather than standard methods assuming independence.
Interpreting Your Results Completely
A complete interpretation always includes: (1) the test statistic value, (2) degrees of freedom, (3) exact p-value, (4) confidence interval for the parameter of interest, (5) effect size with interpretation, and (6) a plain-language conclusion. Never report just a p-value — it communicates only one dimension of a multi-dimensional result.
Effect Size and Practical Significance
Statistical significance tells you that an effect is detectable; effect size tells you whether it matters. For every test, compute and report the appropriate effect size measure alongside the p-value. Use field-specific benchmarks (not just Cohen's generic small/medium/large) to evaluate practical significance.
Common Errors and How to Avoid Them
- Multiple testing without correction: Apply Bonferroni, Holm, or FDR corrections whenever running more than one test on the same dataset.
- Confusing statistical and practical significance: Always ask "is this large enough to matter?" not just "is this detectable?"
- p-hacking: Pre-register hypotheses, analysis plans, and significance thresholds before seeing data.
- Overlooking assumptions: Verify independence, normality (or large n), and homogeneity of variance before applying parametric tests.
When This Test Is Not Appropriate
Every test has boundaries of appropriate application. Understand when to use non-parametric alternatives, when to switch to more complex models, and when the research question requires a different analytic framework entirely. Using the wrong test produces incorrect Type I error rates and power — even if the computation is done correctly.
Reporting in Academic and Professional Contexts
Follow APA 7th edition reporting format for academic publications: report the test statistic with its symbol (t, F, χ², z), degrees of freedom in parentheses, exact p-value to two or three decimal places, and confidence intervals. Example: "A one-sample t-test indicated that study time significantly exceeded the 10-hour benchmark, t(23) = 2.84, p = .009, d = 0.58, 95% CI [10.7, 13.2]."