One of the most common questions in research design: "How many participants do I need?" Too few and you miss real effects (low power). Too many and you waste resources. This guide shows you exactly how to calculate the right sample size for different study types.
Why Sample Size Matters
- Too small: Low statistical power (high β error). Real effects remain undetected. Study is inconclusive.
- Too large: Wasted time, money, and participant burden. Even trivially small, meaningless effects become statistically significant.
- Just right: 80–90% power to detect your expected effect size at α = 0.05.
Key Inputs for Sample Size Calculation
- Confidence level (1−α): Usually 95% (α = 0.05) or 99% (α = 0.01)
- Desired power (1−β): Usually 80% or 90% (β = 0.20 or 0.10)
- Effect size: How big a difference you expect or consider meaningful
- Population variability (σ): From prior studies or a pilot study
- Margin of error (E): For estimation studies — how precise do you need to be?
Formula 1: Estimating a Mean
n = (z × σ / E)²
Example: Estimate mean daily study time for students (σ = 1.5 hours from pilot study). Want 95% CI (z = 1.96) with margin of error E = 0.3 hours.
n = (1.96 × 1.5 / 0.3)² = (9.8)² = 96 students
Formula 2: Estimating a Proportion
n = z² × p(1−p) / E²
Most conservative: Use p = 0.5 when unknown (maximises required n).
Example: Survey to estimate proportion of people who support a policy. 95% CI, E = 0.05, p unknown.
n = (1.96)² × 0.5 × 0.5 / (0.05)² = 3.84 × 0.25 / 0.0025 = 384 respondents
Formula 3: Two-Sample T-Test (Power Analysis)
When comparing two group means with desired power 1−β:
n = 2(z_α/2 + z_β)² × σ² / δ²
Where δ = expected difference between means. This gives the required n per group.
Example: You expect a drug to reduce blood pressure by δ = 8 mmHg (σ = 12 mmHg). α = 0.05, power = 0.80 (z_α/2 = 1.96, z_β = 0.84).
n = 2(1.96 + 0.84)² × 144 / 64 = 2 × 7.84 × 2.25 = 35 per group (70 total)
Practical Adjustments
- Expected dropout: Divide by retention rate. If 20% dropout expected: n_adjusted = n / 0.80
- Finite population correction: If sample is >5% of population N: n_adjusted = n × N/(N + n − 1)
- Design effect (DEFF): For cluster sampling, multiply n by DEFF (typically 1.5–2.5)
Common Sample Size Rules of Thumb
| Study Type | Minimum Recommended n |
| Survey / opinion poll | ≥ 384 (95% CI, ±5%) |
| T-test comparison | ≥ 30 per group |
| Linear regression | ≥ 10–20 observations per predictor |
| Chi-square test | ≥ 5 expected per cell |
| Pilot study | ≥ 12–30 per group |
Use our free Sample Size Calculator to get exact required n for any combination of confidence level, margin of error, and power.
Why Sample Size Matters
Sample size fundamentally determines the precision and statistical power of your study. Too small a sample produces imprecise estimates, fails to detect real effects, and wastes participants' time and effort on inconclusive research. Too large a sample wastes resources and may detect trivially small effects that have no practical importance. Power analysis determines the minimum sample size needed to achieve desired precision and power before collecting data.
Key Components of Sample Size Calculation
Four quantities are interrelated in sample size calculations. Specify three and calculate the fourth:
- Effect size: The minimum difference you want to detect. Based on practical importance, not statistical convention.
- Significance level α: Acceptable Type I error rate (typically 0.05)
- Power (1−β): Probability of detecting the effect if it exists (typically 0.80 or 0.90)
- Sample size n: The unknown to solve for
Sample Size for Estimating a Mean
To achieve margin of error E with confidence level 1−α and known σ: n = (z_α/2 × σ/E)². For 95% confidence (z = 1.96), σ = 10, E = 2: n = (1.96 × 10/2)² = (9.8)² = 96.04, so n = 97. When σ is unknown, estimate from pilot data or literature. Note that n grows with the square of z but inversely with the square of E — halving the margin of error requires quadrupling the sample size.
Sample Size for Comparing Two Means
For a two-sample t-test detecting effect size d = |μ₁−μ₂|/σ with power 1−β at significance α: n per group ≈ 2(z_α/2 + z_β)²/d². For d=0.5 (medium effect), α=0.05, power=0.80: z₀.₀₅/₂ = 1.96, z₀.₂₀ = 0.84. n ≈ 2(1.96+0.84)²/0.25 ≈ 63 per group.
Sample Size for Proportions
To detect a difference between proportions p₁ and p₂: n per group ≈ (z_α/2√(2p̄(1−p̄)) + z_β√(p₁(1−p₁)+p₂(1−p₂)))²/(p₁−p₂)², where p̄=(p₁+p₂)/2. For estimating a single proportion with margin of error E: n = z²p(1−p)/E². Use p=0.5 if unknown (gives maximum/conservative estimate): n = 1.96²×0.25/0.05² = 384 for E=5% margin.
Finite Population Correction
The standard formulas assume an infinite population. When sampling more than 5-10% of a finite population, apply the finite population correction: n_adjusted = n/(1 + n/N), where N is the population size. This correction reduces the required sample size when the population is small — if you survey 100 out of 200 people, you are capturing half the population and need fewer additional observations to achieve the same precision.
Practical Considerations Beyond the Formula
The calculated n is the minimum needed for statistical goals. Practical design requires adjusting for expected non-response (increase n by 1/(1−non-response rate)), dropouts in longitudinal studies, and stratification (may need minimum per stratum). Cluster sampling often requires much larger samples than simple random sampling (the design effect). Budget constraints and ethical limits on participant burden also factor into final decisions.
Why Sample Size Matters
Sample size fundamentally determines the precision and statistical power of your study. Too small a sample produces imprecise estimates, fails to detect real effects, and wastes participants' time and effort on inconclusive research. Too large a sample wastes resources and may detect trivially small effects that have no practical importance. Power analysis determines the minimum sample size needed to achieve desired precision and power before collecting data.
Key Components of Sample Size Calculation
Four quantities are interrelated in sample size calculations. Specify three and calculate the fourth:
- Effect size: The minimum difference you want to detect. Based on practical importance, not statistical convention.
- Significance level α: Acceptable Type I error rate (typically 0.05)
- Power (1−β): Probability of detecting the effect if it exists (typically 0.80 or 0.90)
- Sample size n: The unknown to solve for
Sample Size for Estimating a Mean
To achieve margin of error E with confidence level 1−α and known σ: n = (z_α/2 × σ/E)². For 95% confidence (z = 1.96), σ = 10, E = 2: n = (1.96 × 10/2)² = (9.8)² = 96.04, so n = 97. When σ is unknown, estimate from pilot data or literature. Note that n grows with the square of z but inversely with the square of E — halving the margin of error requires quadrupling the sample size.
Sample Size for Comparing Two Means
For a two-sample t-test detecting effect size d = |μ₁−μ₂|/σ with power 1−β at significance α: n per group ≈ 2(z_α/2 + z_β)²/d². For d=0.5 (medium effect), α=0.05, power=0.80: z₀.₀₅/₂ = 1.96, z₀.₂₀ = 0.84. n ≈ 2(1.96+0.84)²/0.25 ≈ 63 per group.
Sample Size for Proportions
To detect a difference between proportions p₁ and p₂: n per group ≈ (z_α/2√(2p̄(1−p̄)) + z_β√(p₁(1−p₁)+p₂(1−p₂)))²/(p₁−p₂)², where p̄=(p₁+p₂)/2. For estimating a single proportion with margin of error E: n = z²p(1−p)/E². Use p=0.5 if unknown (gives maximum/conservative estimate): n = 1.96²×0.25/0.05² = 384 for E=5% margin.
Finite Population Correction
The standard formulas assume an infinite population. When sampling more than 5-10% of a finite population, apply the finite population correction: n_adjusted = n/(1 + n/N), where N is the population size. This correction reduces the required sample size when the population is small — if you survey 100 out of 200 people, you are capturing half the population and need fewer additional observations to achieve the same precision.
Practical Considerations Beyond the Formula
The calculated n is the minimum needed for statistical goals. Practical design requires adjusting for expected non-response (increase n by 1/(1−non-response rate)), dropouts in longitudinal studies, and stratification (may need minimum per stratum). Cluster sampling often requires much larger samples than simple random sampling (the design effect). Budget constraints and ethical limits on participant burden also factor into final decisions.
Worked Example: Clinical Trial Sample Size
A clinical trial compares a new antidepressant to placebo using the Hamilton Depression Rating Scale (HDRS). The minimum clinically meaningful difference is 3 points (δ = 3). Based on prior studies, σ = 8 points. The team wants 90% power and α = 0.05 (two-tailed).
n per group = 2 × (z_α/2 + z_β)² × σ² / δ² = 2 × (1.96 + 1.282)² × 64 / 9 = 2 × 10.52 × 64 / 9 = 2 × 74.8 ≈ 150 per group.
Expecting 15% dropout, enrol 150/0.85 = 177 per group (355 total). This is a substantial investment — but the calculation shows exactly why: detecting a 3-point difference against background noise of σ=8 requires large samples. If the team reduces target power to 80%, n drops to 112 per group (132 with dropout adjustment), saving 90 participants but accepting higher risk of missing a real effect.
Sample Size for Surveys: A Practical Guide
A marketing team wants to estimate the proportion of customers who would pay for a premium subscription. They want a margin of error of ±4% with 95% confidence. Using p = 0.5 (maximum variance, most conservative): n = 1.96² × 0.5 × 0.5 / 0.04² = 3.8416 × 0.25 / 0.0016 = 600.25, so n = 601.
If they have prior data suggesting p ≈ 0.30: n = 1.96² × 0.30 × 0.70 / 0.04² = 3.8416 × 0.21 / 0.0016 = 504. They can save 97 surveys by using prior knowledge. For a finite population of N = 2,000 customers: adjusted n = 601/(1 + 601/2000) = 601/1.3005 = 462. The finite population correction saves an additional 139 surveys — significant when recruiting customers is expensive.
Consequences of Ignoring Sample Size Planning
A 2022 systematic review found that over 40% of published randomised trials in top medical journals were underpowered (less than 80% power). These studies frequently report null results when a real effect exists, leading to incorrect conclusions that treatments are ineffective. Conversely, some industry-funded trials use enormous samples that detect statistically significant but clinically meaningless effects, then advocate for adoption of expensive treatments. Transparent pre-registration of power analyses on platforms like ClinicalTrials.gov and OSF prevents these abuses and improves research credibility.
Calculate Instantly — 100% Free
45 statistics calculators with step-by-step solutions, interactive charts, and PDF export. No sign-up needed.
▶ Open Free Statistics Calculator
Deep Dive: Sample Size Determination — Theory, Assumptions, and Best Practices
This section provides a comprehensive look at the Sample Size Determination — covering the mathematical theory, step-by-step worked examples, complete assumptions checking, effect size reporting, common mistakes, and real-world applications that go beyond introductory coverage.
Mathematical Foundation
Every statistical procedure rests on a mathematical model of how data is generated. The Sample Size Determination assumes specific data-generating conditions that, when satisfied, guarantee the stated Type I error rate and power. Understanding these foundations helps you know when results are trustworthy and when to seek alternatives.
Assumptions and Diagnostics
Before interpreting any result, verify all assumptions are satisfied. Common assumption violations and their remedies:
- Non-normality: For small samples, use non-parametric alternatives or bootstrap methods. For large samples, the Central Limit Theorem typically provides robustness.
- Outliers: Identify using IQR fence or modified z-scores. Investigate each outlier — correct data errors, but do not delete genuine extreme observations without disclosure.
- Independence violations: Clustered or longitudinal data requires mixed models or GEE rather than standard methods assuming independence.
Interpreting Your Results Completely
A complete interpretation always includes: (1) the test statistic value, (2) degrees of freedom, (3) exact p-value, (4) confidence interval for the parameter of interest, (5) effect size with interpretation, and (6) a plain-language conclusion. Never report just a p-value — it communicates only one dimension of a multi-dimensional result.
Effect Size and Practical Significance
Statistical significance tells you that an effect is detectable; effect size tells you whether it matters. For every test, compute and report the appropriate effect size measure alongside the p-value. Use field-specific benchmarks (not just Cohen's generic small/medium/large) to evaluate practical significance.
Common Errors and How to Avoid Them
- Multiple testing without correction: Apply Bonferroni, Holm, or FDR corrections whenever running more than one test on the same dataset.
- Confusing statistical and practical significance: Always ask "is this large enough to matter?" not just "is this detectable?"
- p-hacking: Pre-register hypotheses, analysis plans, and significance thresholds before seeing data.
- Overlooking assumptions: Verify independence, normality (or large n), and homogeneity of variance before applying parametric tests.
When This Test Is Not Appropriate
Every test has boundaries of appropriate application. Understand when to use non-parametric alternatives, when to switch to more complex models, and when the research question requires a different analytic framework entirely. Using the wrong test produces incorrect Type I error rates and power — even if the computation is done correctly.
Reporting in Academic and Professional Contexts
Follow APA 7th edition reporting format for academic publications: report the test statistic with its symbol (t, F, χ², z), degrees of freedom in parentheses, exact p-value to two or three decimal places, and confidence intervals. Example: "A one-sample t-test indicated that study time significantly exceeded the 10-hour benchmark, t(23) = 2.84, p = .009, d = 0.58, 95% CI [10.7, 13.2]."