Linear regression is one of the most powerful and widely used tools in data analysis. It models the relationship between variables and enables predictions. This guide explains everything from the basic equation to assumptions, interpretation, and practical use.

What is Linear Regression?

Linear regression fits the best straight line through your data, describing how the dependent variable (Y) changes as the independent variable (X) changes. The line is called the regression line or line of best fit.

ŷ = a + bx

Where: ŷ = predicted Y value, a = intercept (Y when X=0), b = slope (change in Y per unit change in X).

Interpreting the Slope and Intercept

Slope (b): For every 1-unit increase in X, Y changes by b units. b = 2.5 means Y increases by 2.5 for each unit increase in X. Negative b means inverse relationship.

Intercept (a): The predicted value of Y when X = 0. Sometimes not meaningful (e.g. predicted height when age = 0).

How the Line is Found — Least Squares

The regression line minimises the sum of squared residuals: SSresid = Σ(yᵢ − ŷᵢ)². Each residual (eᵢ = yᵢ − ŷᵢ) is the vertical distance from a data point to the line. Minimising their sum squared gives the most accurate line.

Understanding R² (Coefficient of Determination)

R² tells you how well your regression line fits the data: what percentage of the variation in Y is explained by X.

InterpretationContext
R² = 0.9595% of variance in Y explained by XStrong fit
R² = 0.6060% explainedModerate fit
R² = 0.2020% explainedWeak fit
R² = 0.00X explains nothing about YNo linear relationship

What counts as "good" R² depends on the field. In physics, R² > 0.99 is expected. In social sciences, R² = 0.40 can be a strong result because human behaviour is inherently variable.

Assumptions of Linear Regression

Making Predictions with Regression

After fitting the line ŷ = a + bx, plug in any X value to get a predicted Y:

Interpolation: Predicting within the range of your data — generally reliable.

Extrapolation: Predicting outside the range of your data — risky. The linear relationship may not hold beyond your data range.

When to Use Linear Regression

When the outcome is binary (yes/no), use logistic regression. When the relationship is curved, use polynomial regression or transform the variables.

Try our free Linear Regression Calculator to fit a regression line and get slope, intercept, R², and predictions instantly.