Chi-Square Test Explained

The chi-square (χ²) test is the go-to statistical test for categorical data. It answers questions like: "Is this die fair?", "Is there a relationship between gender and voting preference?", "Does the distribution of blood types match genetic predictions?" This guide covers both major types of chi-square tests.

What is the Chi-Square Statistic?

χ² = Σ (O − E)² / E

Where O = observed frequency and E = expected frequency. The larger χ², the greater the discrepancy between what you observed and what you expected under H₀.

Type 1: Goodness of Fit Test

Purpose: Test whether observed frequencies for one categorical variable match a hypothesised distribution.

H₀: The observed frequencies follow the specified distribution.

df = k − 1 (where k = number of categories)

Example: A genetics experiment crosses two plants. Mendel's law predicts offspring ratios of 9:3:3:1 for four phenotypes. You observe 315, 108, 101, 32 offspring. Does this match the 9:3:3:1 prediction?

Expected: 556 total × 9/16 = 312.75, 556 × 3/16 = 104.25, 104.25, 34.75. χ² = 0.47. p = 0.93. Fail to reject H₀ — data matches Mendel's predictions.

Type 2: Test of Independence

Purpose: Test whether two categorical variables are related (independent or associated) using a contingency table.

H₀: The two variables are independent (no association).

df = (rows − 1) × (columns − 1)

Expected frequencies: E = (row total × column total) / grand total

Example: Survey 200 people on preferred exercise type (gym/running/cycling) by gender. Is exercise preference independent of gender?

Assumptions and Conditions

Random sample from the population
Independent observations — each person counted only once
Expected frequencies ≥ 5 in each cell — the most important condition. If any E < 5, consider combining categories or using Fisher's Exact Test.
Minimum total n ≥ 20

Interpreting the Results

p < 0.05: Reject H₀. Significant deviation from expected (goodness of fit) or significant association (independence).
p ≥ 0.05: Fail to reject H₀. No significant evidence of deviation or association.

After a significant result, always calculate effect size: Cramér's V = √(χ²/n×(min(r,c)−1)). V = 0.10 small, 0.30 medium, 0.50 large.

Chi-Square vs Fisher's Exact Test

For 2×2 tables with small expected frequencies (any E < 5), use Fisher's Exact Test instead of chi-square. Fisher's test computes an exact p-value without the large-sample approximation that chi-square requires.

Use our free Chi-Square Test Calculator for full results with step-by-step working, or our Fisher's Exact Test Calculator for small samples.

The Two Types of Chi-Square Tests

There are two fundamentally different chi-square tests that share the same distribution. The chi-square goodness-of-fit test compares an observed frequency distribution to an expected distribution (based on theory or a known population). The chi-square test of independence tests whether two categorical variables are associated in a contingency table. Both use the same test statistic formula but answer different questions.

The Chi-Square Test Statistic

For both tests: χ² = Σ[(Observed − Expected)² / Expected], summed over all cells. This statistic is always non-negative and follows the chi-square distribution under H₀. The degrees of freedom determine which chi-square distribution to use. Large χ² values indicate large discrepancies between observed and expected counts, providing evidence against H₀.

Chi-Square Test of Independence: Full Example

A researcher investigates whether coffee preference (coffee/tea) is related to work performance rating (excellent/satisfactory/poor). They survey 200 employees and arrange results in a 2×3 contingency table. Expected frequencies are calculated as: E = (Row Total × Column Total) / Grand Total. The χ² statistic is calculated from all 6 cells.

Degrees of freedom = (rows−1)×(columns−1) = (2−1)×(3−1) = 2. At α=0.05, critical χ² = 5.991. If χ² > 5.991, reject the null hypothesis of independence and conclude coffee preference is associated with performance rating.

Assumptions and the Expected Frequency Rule

For the chi-square approximation to be valid: all cells should have expected frequency ≥ 1, and no more than 20% of cells should have expected frequency < 5. When these conditions are violated, use Fisher's Exact Test (for 2×2 tables) or combine categories. Observed frequencies must be counts, not proportions or percentages.

Goodness-of-Fit Test Example

A genetics researcher expects offspring in a 9:3:3:1 ratio based on Mendel's laws. They observe 315, 108, 101, 32 offspring (total n=556). Expected: 312.75, 104.25, 104.25, 34.75. χ² = (315−312.75)²/312.75 + ... = 0.47. df = 4−1 = 3. Critical χ² = 7.815. Since 0.47 < 7.815, fail to reject H₀ — data is consistent with Mendelian ratios.

Measures of Association for Contingency Tables

A significant chi-square tells you association exists but not its strength. Effect size measures include:

Phi (φ): For 2×2 tables, φ = √(χ²/n). Small: 0.1, Medium: 0.3, Large: 0.5
Cramér's V: For larger tables, V = √(χ²/(n×min(r−1,c−1))). Same benchmarks as phi.
Odds Ratio: For 2×2 tables, quantifies the odds of one outcome relative to another

Limitations of Chi-Square Tests

Chi-square tests have important limitations. They only detect whether association exists, not its direction or strength (beyond effect size measures). They require adequate sample sizes. They cannot be used for continuous data without categorisation (which loses information). They assume independent observations — don't use for matched or paired data (use McNemar's test instead).

The Two Types of Chi-Square Tests

The Chi-Square Test Statistic

Chi-Square Test of Independence: Full Example

Assumptions and the Expected Frequency Rule

Goodness-of-Fit Test Example

Measures of Association for Contingency Tables

A significant chi-square tells you association exists but not its strength. Effect size measures include:

Phi (φ): For 2×2 tables, φ = √(χ²/n). Small: 0.1, Medium: 0.3, Large: 0.5
Cramér's V: For larger tables, V = √(χ²/(n×min(r−1,c−1))). Same benchmarks as phi.
Odds Ratio: For 2×2 tables, quantifies the odds of one outcome relative to another

Limitations of Chi-Square Tests

McNemar's Test for Paired Categorical Data

When you have paired or matched categorical data — for example, the same subjects rated before and after treatment — the standard chi-square test is inappropriate because observations are not independent. McNemar's test is designed specifically for this situation, examining only the discordant pairs (cases where classification changed). It is widely used in medical research to compare diagnostic test results or treatment responses in matched designs. The test statistic follows a chi-square distribution with 1 degree of freedom, but only the off-diagonal cells of the 2×2 table contribute to the test.

Complete Step-by-Step Example: Independence Test

A hospital surveys 400 patients about their satisfaction (Satisfied/Neutral/Dissatisfied) across three wards (Surgery/Medicine/Paediatrics). Observed counts:

Ward	Satisfied	Neutral	Dissatisfied	Total
Surgery	80	40	30	150
Medicine	70	50	30	150
Paediatrics	60	20	20	100
Total	210	110	80	400

Expected for Surgery/Satisfied = (150 × 210)/400 = 78.75. Calculate expected for all 9 cells. χ² = Σ(O−E)²/E. Computing all cells: χ² ≈ 5.47. df = (3−1)(3−1) = 4. Critical χ²(4, 0.05) = 9.488. Since 5.47 < 9.488, p ≈ 0.24. Fail to reject H₀ — no significant association between ward and satisfaction level at α=0.05.

Goodness-of-Fit: Testing a Genetic Hypothesis

Mendel predicted pea plant colours in a 3:1 ratio (dominant:recessive). Observed from 1000 plants: 740 dominant, 260 recessive. Expected: 750 and 250. χ² = (740−750)²/750 + (260−250)²/250 = 0.133 + 0.400 = 0.533. df = 1, critical χ²(0.05) = 3.841. Since 0.533 < 3.841, p ≈ 0.47. The observed ratio is consistent with Mendel's 3:1 prediction — a beautiful illustration of the goodness-of-fit test confirming a theoretical genetic model with real experimental data.

When Chi-Square Fails: Fisher's Exact as the Solution

A rare disease researcher has only 15 patients, comparing two treatments. Observed: Treatment A: 6 improved, 2 not; Treatment B: 2 improved, 5 not. The expected count for "B/Not improved" is only 2.33, violating the chi-square assumption. Fisher's Exact Test computes: p = (8!7!8!7!)/(15! × 6! × 2! × 2! × 5!) ≈ 0.065. At α=0.05, fail to reject H₀ — the difference is not statistically significant with these small samples. The lesson: always check expected cell counts before applying chi-square, and switch to Fisher's Exact when cells are too small.

Calculate Instantly — 100% Free

45 statistics calculators with step-by-step solutions, interactive charts, and PDF export. No sign-up needed.

▶ Open Free Statistics Calculator

🔗 Related Resources

Statistical Meth Chi-Square Test Calculator → Statistical Meth Fisher's Exact Test Calculator → Statistical Meth Contingency Table Calculator → All Articles Browse All Statistics Articles →

What is the Chi-Square Statistic?

Type 1: Goodness of Fit Test

Type 2: Test of Independence

Assumptions and Conditions

Interpreting the Results

Chi-Square vs Fisher's Exact Test

The Two Types of Chi-Square Tests

The Chi-Square Test Statistic

Chi-Square Test of Independence: Full Example

Assumptions and the Expected Frequency Rule

Goodness-of-Fit Test Example

Measures of Association for Contingency Tables

Limitations of Chi-Square Tests

The Two Types of Chi-Square Tests

The Chi-Square Test Statistic

Chi-Square Test of Independence: Full Example

Assumptions and the Expected Frequency Rule

Goodness-of-Fit Test Example

Measures of Association for Contingency Tables

Limitations of Chi-Square Tests

McNemar's Test for Paired Categorical Data

Complete Step-by-Step Example: Independence Test

Goodness-of-Fit: Testing a Genetic Hypothesis

When Chi-Square Fails: Fisher's Exact as the Solution

Calculate Instantly — 100% Free

Deep Dive: Chi Square Test Explained — Theory, Assumptions, and Best Practices

Mathematical Foundation

Assumptions and Diagnostics

Interpreting Your Results Completely

Effect Size and Practical Significance

Common Errors and How to Avoid Them

When This Test Is Not Appropriate

Reporting in Academic and Professional Contexts

Statistical Reasoning: Building Intuition Through Examples

Case Study 1: Healthcare Research Application

Case Study 2: Business Analytics Application

Case Study 3: Educational Assessment

Understanding Output from Statistical Software

Integrating Multiple Analyses

Statistical Software Commands Reference

Frequently Asked Questions: Advanced Topics

Can I use this test with non-normal data?

How do I handle missing data?

What is the difference between a one-sided and two-sided test?

How should I report results in a research paper?