An outlier is a data point that differs significantly from other observations. Outliers can be errors, rare genuine events, or the most interesting data points in your dataset. How you handle them dramatically affects your analysis — so detection must come before the decision to remove.

Why Outliers Matter

Method 1: IQR Fence Method (Tukey's Method)

The most widely used and intuitive outlier detection method:

Lower fence = Q1 − 1.5 × IQR
Upper fence = Q3 + 1.5 × IQR

Any value below the lower fence or above the upper fence is a potential outlier. Using 3×IQR gives extreme outlier fences.

Example: Data: 10, 12, 14, 15, 17, 18, 19, 20, 65. Q1=13, Q3=19, IQR=6. Lower fence = 13 − 9 = 4. Upper fence = 19 + 9 = 28. Value 65 > 28 → outlier detected.

Advantages: Robust (uses median-based quartiles, not the mean). Works well for skewed data. Standard in box plots.

Method 2: Z-Score Method

z = (x − x̄) / s

Values with |z| > 2 are potential outliers. Values with |z| > 3 are extreme outliers (occur only 0.3% of the time in normal data).

Limitations: The z-score method uses the mean and SD — which are themselves affected by outliers. In small samples, extreme values cannot achieve |z| > 3 mathematically. Better for large, approximately normal datasets.

Method 3: Modified Z-Score (Iglewicz-Hoaglin)

M = 0.6745 × (xᵢ − median) / MAD

Where MAD = Median Absolute Deviation. More robust than standard z-score because it uses the median instead of the mean. Values with |M| > 3.5 are outliers.

Method 4: Grubbs' Test

A formal statistical test for whether the most extreme value in a dataset is a statistical outlier. Tests one outlier at a time. Assumes approximately normal data. Available in many statistical software packages.

Should You Remove Outliers?

This is the most important and most mishandled question. The answer depends entirely on WHY the outlier exists:

Remove when:

Keep when:

Never:

Use our free Outlier Detection Calculator to identify outliers using both IQR and Z-score methods simultaneously.