🧮 Regression · Free Online

Free Pearson Correlation

Free Pearson correlation calculator. Compute r, r², t-statistic, and significance test. Interpret correlation strength and direction.

100% Free
📋 Step-by-Step Solutions
📊 Interactive Charts
No Sign-Up Required
📱 Works on Mobile
📅 Last updated: March 2025  ·  ✅ Verified accurate  ·  ⏱ 2 min read
📖 What Is the Pearson Correlation? Regression

Free Pearson correlation calculator. Compute r, r², t-statistic, and significance test. Interpret correlation strength and direction.

Use to measure the strength and direction of linear relationship between two continuous variables. Both variables should be approximately normally distributed for inference.

|r| near 1 = strong linear relationship. |r| near 0 = weak or no linear relationship. r only measures LINEAR association — a strong curved relationship may give r≈0. Always plot a scatter diagram first!

💡 Example

Study hours vs exam score for 7 students: r=0.97→ Very strong positive correlation. R²=0.94 → 94% of variance explained.

🚀 How to Use This Calculator
Click "Open Calculator" above to launch the Pearson Correlation on StatSolve Pro. No account or download needed — it runs instantly in your browser.
Enter your data in the input fields provided, or click Load Sample to try a pre-filled example immediately.
Click Calculate to see full results: the statistic, p-value or probability, and a complete step-by-step solution with formulas.
Read the interpretation — the results page explains what your answer means in plain language, with decision rules and context.
Export or copy your results using the PDF export or Copy button to save or share your calculation.

Ready to Calculate?

Open the free Pearson Correlation instantly — no login, no download, works in any browser.

▶ Launch Pearson Correlation Free
🎯 Who Uses This Calculator?

Pearson Correlation Calculator is trusted by:

No software installation needed. Works in Chrome, Firefox, Safari, and Edge on desktop and mobile.

📚 Also explore: Spearman Rank Correlation Calculator, Covariance Calculator, Linear Regression Calculator, How to Calculate Correlation Coefficient

📚 See Also
🌐 External Learning Resources
🔗 Related Resources

Pearson Correlation: Advanced Topics and Practical Guidance

The Pearson correlation coefficient is simultaneously one of statistics' most used and most misused tools. Beyond the basic formula and significance test lie important practical issues: the effect of outliers, restriction of range, attenuation by measurement error, partial correlations, and the crucial distinction between correlation and causation.

The Effect of Outliers on Pearson r

A single outlier can dramatically alter Pearson r. An influential outlier that aligns with the trend can increase r toward 1.0; an outlier that contradicts the trend can decrease r or even reverse its sign. This is why always plot your data first — Anscombe's Quartet (four datasets with identical r, mean, and variance but completely different scatter plots) is the classic illustration of why numerical summaries alone are insufficient.

Restriction of Range

If you measure the correlation between GRE scores and graduate GPA only among students who were admitted (typically high GRE scorers), you will find a lower correlation than the true population correlation. This attenuation due to selecting a non-representative range of X is called restriction of range. The correction formula: r_true = r_obs × √(σ²_X_pop/σ²_X_sample) / √(1 − r²_obs + r²_obs × σ²_X_pop/σ²_X_sample). This correction is essential when validating selection tools on selected samples.

Attenuation by Measurement Error

When X and Y are measured with error, the observed correlation is attenuated (reduced) relative to the true underlying correlation. The correction for attenuation: r_true = r_obs / √(r_XX × r_YY), where r_XX and r_YY are the reliabilities of X and Y respectively. If both variables are measured with 80% reliability and observed r = 0.40, then r_true = 0.40/√(0.80 × 0.80) = 0.40/0.80 = 0.50. Measurement improvement is sometimes more efficient than increasing sample size.

Partial and Semi-Partial Correlations

Partial correlation r_XY.Z measures the correlation between X and Y after removing the linear influence of a third variable Z from both. This is used to test whether an observed correlation is "genuine" or fully explained by a confounding variable. Semi-partial (part) correlation r_X(Y.Z) removes Z from Y only — it represents the unique contribution of X to Y beyond what Z already explains, and its square gives the unique variance in Y explained by X.

Non-Linear Relationships: When Pearson r Fails

Pearson r only measures linear association. For data where Y = X² (a perfect quadratic relationship), Pearson r could be close to zero. Always examine scatter plots for non-linear patterns. Alternatives for non-linear but monotonic relationships: Spearman's rₛ. For general non-linear associations: the maximal information coefficient (MIC) or distance correlation.

Confidence Interval for ρ: Fisher's Z Transformation

The sampling distribution of r is skewed (especially for |ρ| near 1), so confidence intervals for ρ require Fisher's Z transformation:

Z = 0.5 × ln[(1+r)/(1−r)] = arctanh(r)

Z is approximately normally distributed with SE = 1/√(n−3). Compute the 95% CI for Z: Z ± 1.96/√(n−3), then transform back using r = tanh(Z). For example, r = 0.70, n = 50: Z = arctanh(0.70) = 0.867, SE = 1/√47 = 0.146. 95% CI for Z: (0.867 − 0.286, 0.867 + 0.286) = (0.581, 1.153). Back-transforming: (tanh(0.581), tanh(1.153)) = (0.524, 0.819). The 95% CI for ρ is (0.52, 0.82).

Point-Biserial Correlation

When one variable is continuous and the other is binary (0/1), the Pearson correlation is called the point-biserial correlation r_pb. It is mathematically identical to Pearson r applied to the binary variable. r_pb is directly related to the independent t-test: t = r_pb × √(n−2)/√(1−r²_pb). It can also serve as the effect size measure for a two-sample t-test.

How Many Data Points Do You Need?

Statistical power for detecting a Pearson correlation depends on the true ρ, sample size n, and significance level α:

True ρn for 80% Power (α=0.05)n for 90% Power
0.10 (small)7821046
0.30 (medium)84112
0.50 (large)2837
0.70 (very large)1216

Worked Examples: Pearson Correlation Calculator Step by Step

Practice is essential for mastering statistical methods. The following worked examples cover a range of scenarios — from simple textbook cases to realistic research situations — building your confidence and intuition through active application of the concepts above.

Example 1: Basic Application

Consider a standard scenario for the Pearson Correlation Calculator. Begin by identifying the research question and null hypothesis, then select appropriate parameters, check all assumptions, compute the test statistic, determine the p-value, and state conclusions in the context of the problem.

Example 2: Applied Research Scenario

In applied research, data rarely arrives perfectly formatted. You may encounter missing values, measurement error, borderline assumption violations, and multiple candidate analytical approaches. Working through realistic examples builds the judgment needed to navigate these situations correctly.

Example 3: Interpreting Computer Output

Statistical software (R, Python, SPSS, Stata, SAS) produces rich output including test statistics, p-values, confidence intervals, and diagnostic information. Learning to read and critically evaluate this output — identifying what is essential, what is supplementary, and what might indicate problems — is a critical skill for any data analyst.

Key Formulas Summary

For quick reference, here are the essential formulas, the conditions under which they are valid, and the R and Python commands used to compute them. Having these organized and accessible accelerates your workflow and reduces the risk of applying the wrong formula in a high-pressure situation.

Practice Problems with Solutions

The best way to solidify your understanding is to work through problems yourself before checking the solution. Start with simpler cases to build confidence, then tackle more complex scenarios that require judgment about assumptions, multiple testing, and effect size interpretation. Our free online calculator handles the computation — focus your energy on the setup, interpretation, and critical evaluation of results.

Connection to Other Statistical Concepts

Statistical methods do not exist in isolation. This procedure connects to hypothesis testing principles, the sampling distribution theory established by the Central Limit Theorem, effect size measures, confidence interval construction, and the broader framework of statistical inference. Understanding these connections makes you a more versatile and insightful analyst.

Frequently Confused Concepts

Certain pairs of concepts are persistently confused even by experienced practitioners. Clearing up these confusions transforms your statistical reasoning.

Statistical Significance vs. Clinical/Practical Significance

A result can be statistically significant (p < 0.05) but clinically trivial (effect size near zero with enormous sample size), or clinically important but not statistically significant (large effect size in an underpowered small study). Always assess both dimensions. The confidence interval is the key tool: it shows both whether the result is significant (excludes the null value) and the magnitude of the effect (the range of plausible values).

One-Tailed vs. Two-Tailed Tests

A one-tailed test is justified only when the research hypothesis specifies the direction of the effect before data collection. If you specify a one-tailed test after seeing the data direction (to halve a borderline p-value), this is p-hacking and produces inflated false positive rates. When in doubt, use a two-tailed test — it is the more conservative and generally accepted default.

The P-Value Is Not the Probability H₀ Is True

The p-value = P(data this extreme | H₀ is true). It is NOT P(H₀ true | this data). Computing the latter requires Bayes' theorem with a prior on H₀. With a high prior probability that H₀ is true (common in exploratory research), even p = 0.001 may correspond to only modest posterior probability that H₁ is true. This is one reason many statisticians advocate for Bayesian methods or effect size reporting over binary significance testing.

Statistical Reasoning: Building Intuition Through Examples

Statistical mastery comes from seeing the same concepts applied across many different contexts. The following worked examples and case studies reinforce the core principles while showing their breadth of application across medicine, social science, business, engineering, and natural science.

Case Study 1: Healthcare Research Application

A clinical researcher wants to evaluate whether a new physical therapy protocol reduces recovery time after knee surgery. The study design, data collection, statistical analysis, and interpretation each require careful thought. The researcher must choose appropriate sample sizes, select the right statistical test, verify all assumptions, compute the test statistic and p-value, report the effect size with confidence interval, and interpret the result in terms patients and clinicians can understand. Each step builds on a solid understanding of statistical theory.

Case Study 2: Business Analytics Application

An e-commerce company wants to know if customers who see a new product recommendation algorithm spend more money per session. They have access to data from 50,000 user sessions split evenly between the old and new algorithms. The statistical question is clear, but practical considerations — multiple testing across different metrics, confounding by device type and geography, and the distinction between statistical and business significance — require careful navigation. Understanding the underlying statistical framework guides every analytical decision.

Case Study 3: Educational Assessment

A school district implements a new math curriculum and wants to evaluate its effectiveness using standardized test scores. Before-after comparisons, control group selection, and the inevitable regression-to-the-mean effect must all be addressed. Measuring whether changes are genuine improvements or statistical artifacts requires the full toolkit: descriptive statistics, assumption checking, appropriate tests for the design, effect size calculation, and honest acknowledgment of limitations.

Understanding Output from Statistical Software

When you run this analysis in R, Python, SPSS, or Stata, the software produces detailed output with more numbers than you need for any single analysis. Knowing which numbers are essential (test statistic, df, p-value, CI, effect size) vs. diagnostic vs. supplementary is a critical skill. Our calculator extracts the key results and presents them in a clear, interpretable format — but understanding what each number means, where it comes from, and what would make it change is what separates a statistician from a button-pusher.

Integrating Multiple Analyses

Real research rarely involves a single statistical test in isolation. Typically, a full analysis includes: (1) data quality checks and outlier investigation, (2) descriptive statistics for all key variables, (3) visualization of distributions and relationships, (4) assumption verification for planned inferential tests, (5) primary inferential analysis with effect size and CI, (6) sensitivity analyses testing robustness to assumption violations, and (7) subgroup analyses if pre-specified. This holistic approach produces more trustworthy and complete results than any single test alone.

Statistical Software Commands Reference

For those implementing these analyses computationally: R provides comprehensive implementations through base R and packages like stats, car, lme4, and ggplot2 for visualization. Python users rely on scipy.stats, statsmodels, and pingouin for statistical testing. Both languages offer excellent power analysis tools (R: pwr package; Python: statsmodels.stats.power). SPSS and Stata provide menu-driven interfaces alongside powerful command syntax for reproducible analyses. Learning at least one of these tools is essential for any applied statistician or data scientist.

Frequently Asked Questions: Advanced Topics

These questions address subtle points that often confuse even experienced analysts:

Can I use this test with non-normal data?

For large samples (generally n ≥ 30 per group), the Central Limit Theorem ensures that test statistics based on sample means are approximately normally distributed regardless of the population distribution. For small samples with clearly non-normal data, use a non-parametric alternative or bootstrap methods. The key question is not "is my data normal?" but "is the sampling distribution of my test statistic approximately normal?" These are different questions with different answers.

How do I handle missing data?

Missing data is ubiquitous in real research. Complete case analysis (listwise deletion) is the default in most software but can introduce bias if data is not Missing Completely At Random (MCAR). Better approaches: multiple imputation (creates several complete datasets, analyzes each, and pools results using Rubin's rules) and maximum likelihood methods (FIML/EM algorithm). The choice depends on the missing data mechanism and the nature of the analysis. Never delete variables with many missing values without considering the implications.

What is the difference between a one-sided and two-sided test?

A two-sided test rejects H₀ if the test statistic is extreme in either direction. A one-sided test rejects only in the pre-specified direction. The one-sided p-value is half the two-sided p-value for symmetric test statistics. Use a one-sided test only if: (1) the research question is inherently directional, (2) the direction was specified before data collection, and (3) results in the opposite direction would have no practical meaning. Never switch from two-sided to one-sided after seeing which direction the data points — this doubles the effective false positive rate.

How should I report results in a research paper?

Follow APA 7th edition: report the test statistic with its symbol (t, F, χ², z, U), degrees of freedom in parentheses (except for z-tests), exact p-value to two-three decimal places (write "p = .032" not "p < .05"), effect size with confidence interval, and the direction of the effect. Example for a t-test: "The experimental group (M = 72.4, SD = 8.1) scored significantly higher than the control group (M = 68.1, SD = 9.3), t(48) = 1.88, p = .033, d = 0.50, 95% CI for difference [0.34, 8.26]." This one sentence communicates the complete statistical story.

❓ Frequently Asked Questions
Pearson r measures the strength and direction of the linear relationship between two continuous variables. It ranges from −1 (perfect negative) to +1 (perfect positive), with 0 meaning no linear relationship.
|r| < 0.3 is weak, 0.3–0.5 is moderate, 0.5–0.7 is strong, and >0.7 is very strong. Always plot your data — a high r doesn't guarantee linearity.
R² (the coefficient of determination) is r squared. It represents the proportion of variance in one variable explained by the other.
No. A significant r only shows association, not causation. A third variable (confounder) could explain both variables.
r values: |r| > 0.9 = very strong, 0.7–0.9 = strong, 0.5–0.7 = moderate, 0.3–0.5 = weak, < 0.3 = negligible correlation. Context matters — in social sciences, r = 0.5 can be considered strong.
No. Correlation only shows that two variables are linearly related. A third variable (confound) could cause both, or the relationship could be coincidental.
Pearson measures linear correlation between continuous variables assuming normality. Spearman measures monotonic correlation between ranked data and is non-parametric, better for ordinal data or non-normal distributions.
🔗 Explore More on StatSolve Pro
📊 Popular Calculators
Descriptive StatisticsT-Test CalculatorZ-Score CalculatorNormal DistributionLinear RegressionANOVA Calculator
📝 Guides & Articles
What is Sampling?Types of DataMean vs Median vs ModeHypothesis Testing GuideP-Value ExplainedStatistics Glossary