About StatSolve Pro
Our mission, philosophy, and commitment to making statistics accessible to everyone
🎯 Our Mission

StatSolve Pro was created with a single, clear purpose: to make statistical analysis genuinely accessible to every student, researcher, analyst, and curious learner in the world — regardless of their background, institution, or budget. Statistics is the language of data, and in today's world, understanding data is no longer optional. Whether you are a psychology undergraduate running your first t-test, a public health researcher analyzing clinical trial results, a business analyst interpreting A/B test outcomes, or a teacher preparing classroom examples, you deserve tools that are powerful, clear, and honest about how they work.

We believe that showing the answer is not enough. Any calculator can give you a mean or a p-value. What truly matters is understanding why a calculation works, what assumptions underlie it, and how to interpret the result in context. That is why every single tool in StatSolve Pro — all 50 of them — shows the complete, step-by-step mathematical reasoning behind every result. We substitute actual values into formulas so you can follow the logic at every stage, not just read a final number.

Our mission extends beyond calculation. StatSolve Pro is designed as a learning companion. The built-in educational sections for each tool explain the core concept, the historical context of the method, when it is appropriate to use it, what assumptions it makes, how to interpret outputs correctly, and what mistakes to avoid. We want users to leave each session not just with an answer, but with a deeper understanding of the statistical method they used.

📖 Why We Built This

The team behind StatSolve Pro came together out of frustration with existing tools. Professional statistical software like SPSS, SAS, or Stata costs hundreds or thousands of dollars per year — prices that put them out of reach for students in developing countries, independent researchers, and small organizations. Open-source alternatives like R and Python are powerful but require significant programming knowledge that many users — especially those in social sciences, health, business, and education — simply do not have.

At the other extreme, simple online calculators give one-line answers with no explanation, no context, and no educational value. They tell you the answer but leave you no better equipped to understand statistics. Students copy the result without grasping the concept, leading to misapplication and misinterpretation in their own work.

StatSolve Pro fills the gap between these extremes. It is free and requires no installation or registration. It runs entirely in your web browser, which means it works on any device — desktop, laptop, tablet, or phone — and it works offline once loaded. Every tool is designed for the user who wants to understand statistics, not just compute it.

🔧 What We Offer

StatSolve Pro currently includes 50 professional-grade statistical calculators organized across six major categories. In Descriptive Statistics, you will find tools for computing complete summary statistics, building frequency distributions, analyzing grouped data, performing five-number summaries and box plot analysis, detecting outliers using both IQR and Z-score methods, computing geometric and harmonic means, weighted averages, moving averages, and rank-percentile analysis. These tools form the foundation of any statistical analysis.

The Probability category covers basic probability rules (complement, union, intersection, conditional probability), Bayes' Theorem for updating beliefs with new evidence, permutation and combination counting, and expected value with variance for discrete distributions. These tools are essential for understanding risk, decision-making under uncertainty, and the foundations of inferential statistics.

In Distributions, we provide calculators for binomial, Poisson, normal, Z-score, Chebyshev's inequality, geometric, exponential, uniform, and hypergeometric distributions. Each distribution calculator computes exact probabilities, cumulative probabilities, quantiles, and distribution parameters, and displays an interactive chart of the distribution shape.

The Hypothesis Testing section is one of the most comprehensive free offerings available online. It includes one-sample Z-test, one-sample t-test, two-sample Welch's t-test, paired t-test, one-proportion Z-test, two-proportion Z-test, chi-square goodness-of-fit test, 2×2 contingency table analysis with chi-square, odds ratio, and relative risk, plus confidence interval estimation and sample size calculation.

Regression and Correlation tools include simple linear regression with slope, intercept, R², residuals, and prediction; Pearson correlation with significance testing; Spearman rank correlation; one-way ANOVA with full ANOVA table; and covariance analysis. The Advanced section adds skewness and kurtosis analysis and standard error computation. Every tool includes step-by-step solutions, educational content, interactive charts, and PDF export.

📊 Interactive Charts and Visualizations

Numbers alone rarely communicate the full story that data has to tell. That is why StatSolve Pro integrates interactive Chart.js visualizations directly into key tools. When you calculate a binomial distribution, you see a bar chart of all probabilities with your target value highlighted. The normal distribution tool draws the bell curve with the area of interest shaded. Regression analysis shows a scatter plot with the fitted regression line overlaid. Correlation tools display the data pattern visually so you can verify that a linear model is appropriate. Moving average calculations show both the raw time series and the smoothed trend line together.

These visualizations are not decorative — they are analytical tools. Research in statistics education consistently shows that combining numeric outputs with visual representations dramatically improves comprehension and retention. Seeing the shape of a distribution, the spread of residuals around a regression line, or the location of outliers in a scatter plot makes abstract concepts concrete and memorable.

👥 Who Uses StatSolve Pro

Our users span a remarkable range of disciplines and contexts. University and college students use StatSolve Pro for statistics courses in psychology, economics, business administration, biology, education, sociology, nursing, and engineering. The step-by-step solutions help them check their manual calculations, understand where they went wrong, and prepare for exams.

Graduate researchers and academics use it as a quick verification tool — confirming that their software outputs are correct, checking calculations during the writing process, and exploring distribution properties during the study design phase. Teachers and professors use it to generate worked examples, demonstrate statistical concepts in class, and create problem sets with verifiable solutions.

Data analysts and business professionals use StatSolve Pro for rapid statistical checks that do not warrant opening a full statistical software environment — verifying a confidence interval, computing the required sample size for an upcoming study, or quickly checking whether a distribution assumption is reasonable for their data.

Self-learners — one of our most valued user groups — use StatSolve Pro as a guided textbook substitute. The combination of calculator, step-by-step solution, formula reference, usage guidelines, and interpretation advice provides everything needed to understand a statistical concept independently, at your own pace, without a formal course.

🔒 Our Commitment to Privacy

StatSolve Pro is built on a foundational privacy principle: your data belongs to you. Every calculation — whether you are entering sensitive medical measurements, proprietary business data, or academic research values — runs entirely within your browser. No data is ever transmitted to any server. We cannot see what you calculate because the calculations happen entirely on your device. There are no user accounts, no login required, and no tracking of individual behavior. We designed the application this way deliberately, and we will not change this approach.

🔭 Looking Forward

StatSolve Pro is an active project. We are continuously working on new tools, improved visualizations, enhanced educational content, and accessibility improvements. Planned additions include two-way ANOVA, multiple linear regression, time series analysis tools, nonparametric tests, and expanded distribution calculators. We are also developing a guided problem-solving mode that walks users through selecting the right statistical test for their data and research question.

If you have suggestions, found a bug, or want to contribute educational content, we would love to hear from you through our contact form. Statistics education is a collaborative endeavor, and every piece of feedback makes StatSolve Pro better for the entire community of learners and researchers who depend on it.

Contact Us
We're here to help — reach out with questions, feedback, bug reports, or collaboration enquiries
📧
Direct Email
statsolveprohelp@gmail.com
We respond within 24–48 business hours · No automated replies
Send Email →
📬 Send Us a Message

Prefer a form? Fill in the fields below and we will get back to you at your provided email address. All fields help us route your message to the right person and respond more effectively.

🗂️ What to Include in Your Message

To help us respond quickly and accurately, here are tips for each enquiry type:

⏱️ Response Times and Our Commitment

We aim to respond to all messages within 24–48 business hours. Bug reports affecting calculation accuracy are treated as high priority and are typically acknowledged within a few hours. Feature requests are reviewed weekly and compiled into our public roadmap. Privacy inquiries are addressed with the highest urgency — we commit to responding within 24 hours regardless of time zone.

We do not use automated responses or chatbots. Every reply you receive comes from a real person on the StatSolve Pro team. We read every message carefully, even if our response time is occasionally longer than we would like during periods of high volume. Your feedback genuinely shapes the development of this tool.

🏫 Educational Partnerships and Classroom Use

StatSolve Pro is used in classrooms and online courses across dozens of countries. If you are an instructor who wants to use StatSolve Pro as a teaching tool, we would love to support you. We can provide:

Academic use inquiries can be directed to statsolveprohelp@gmail.com with "Educational Partnership" in the subject line. We respond to all academic inquiries within 48 hours.

🤝 Collaboration and Content

If you are a statistician, educator, data scientist, or subject-matter expert who would like to contribute to StatSolve Pro — whether by reviewing the accuracy of existing tools, writing educational content for underrepresented methods, or suggesting additions to our formula reference sections — we welcome that conversation. StatSolve Pro is built on the belief that quality statistical education should be a community effort, and we actively seek collaboration with experts who share that vision.

📍 About Our Team

StatSolve Pro is an independent project built and maintained by a small team passionate about statistics education and open-access tools. We are not affiliated with any university, software company, or statistics organisation. Our work is funded through display advertising — the same ads that support much of the free content on the web. If you find StatSolve Pro useful, the best way to support us is simply to keep using it and share it with colleagues and students who might benefit.

We are reachable at statsolveprohelp@gmail.com. We look forward to hearing from you.

Privacy Policy
Last updated: January 2025 · Your privacy is fundamental to how we built StatSolve Pro
1. Introduction and Our Privacy Philosophy

At StatSolve Pro, privacy is not an afterthought — it is a design principle that guided every technical decision we made when building this application. We operate from a straightforward premise: the data you enter into a statistics calculator is your data, and it should never leave your control without your explicit, informed consent.

This Privacy Policy explains in plain language what information we collect (very little), how we use it (minimally), what we do not collect (most things), and why our architecture makes privacy the default rather than the exception. We encourage you to read this document in full. If you have questions about any part of it, please use our Contact page to reach out directly.

This policy applies to all users of StatSolve Pro accessed through any browser or device. By using StatSolve Pro, you acknowledge that you have read and understood this policy.

2. What Data We Do NOT Collect

To be absolutely clear about the scope of our data practices, here is an explicit list of data we do not collect, store, or process on any server:

This is not marketing language — it is technically accurate. Because all calculations run as JavaScript in your browser, there is no mechanism by which this data could reach our servers even if we wanted it to.

3. How StatSolve Pro Works — Client-Side Architecture

Understanding why your data is private requires a brief explanation of how StatSolve Pro is built. StatSolve Pro is a client-side web application. This means the entire application — all 50 calculators, all statistical algorithms, all step-by-step solution generators, all chart renderers — runs as code that executes inside your browser on your device.

When you load StatSolve Pro, your browser downloads the HTML, CSS, and JavaScript files that constitute the application. After that initial download, StatSolve Pro runs entirely locally. When you enter data and click Calculate, the computation happens in your browser's JavaScript engine — the same environment that runs games, animations, and interactive elements on millions of websites. The result appears on your screen without any network request being made to any server.

You can verify this yourself: after loading StatSolve Pro, disconnect your device from the internet and continue using all 50 calculators. Everything will work identically. This is because there is no server involved in the calculations. This architecture was a deliberate choice made to guarantee user privacy by technical design, not just by policy promise.

4. Information We May Collect — Contact Form Only

The only situation in which we collect any personal information is when you voluntarily submit our contact form. If you choose to contact us, we collect the name, email address, subject, and message content that you provide. This information is used solely to respond to your inquiry. We do not add contact form submitters to mailing lists, share this information with third parties, or use it for any commercial purpose.

Contact form submissions are retained only as long as necessary to address your inquiry and any reasonable follow-up period. We do not maintain permanent records of contact communications beyond what is operationally necessary.

5. Third-Party Services and CDN Dependencies

StatSolve Pro loads two external resources from third-party content delivery networks. First, font files are loaded from Google Fonts (fonts.googleapis.com) to render the IBM Plex Mono and DM Sans typefaces used in the interface. Second, the Chart.js charting library is loaded from Cloudflare's CDN (cdnjs.cloudflare.com) to power our interactive visualizations.

When your browser requests these resources, the CDN providers' servers receive your IP address and browser user-agent string as part of standard HTTP request processing. This is a technical consequence of how the internet works — any resource loaded from an external server involves a network request. These requests are governed by Google's Privacy Policy and Cloudflare's Privacy Policy respectively, neither of which StatSolve Pro controls.

If you wish to eliminate these external requests entirely, you may download StatSolve Pro as a single HTML file, replace the CDN links with locally hosted copies of the fonts and Chart.js library, and run it from your local filesystem. In that configuration, StatSolve Pro makes no external network requests whatsoever.

We have deliberately kept external dependencies to an absolute minimum — just two CDN resources — and do not use any analytics services, advertising networks, social media tracking pixels, or third-party data collection tools of any kind.

6. Cookies and Local Storage

StatSolve Pro does not use cookies for any purpose. We do not set persistent cookies, session cookies, tracking cookies, or analytical cookies. We do not use browser localStorage or sessionStorage to retain any user data between sessions. Each time you load StatSolve Pro, it starts fresh with no memory of previous sessions. This means you will need to re-enter data each session, but it also means that no record of your activity is retained on your device or anywhere else.

7. PDF Exports and Data Security

When you use the Export PDF feature, StatSolve Pro prepares a print-ready version of your solution and triggers your browser's native print dialog. This process is handled entirely by your browser and your operating system's PDF generation capabilities. No data is transmitted to any server during this process. The resulting PDF file is saved directly to your device without passing through any StatSolve Pro infrastructure.

If you are working with sensitive or confidential data — medical measurements, proprietary business figures, personal financial data, or research data under confidentiality agreements — you can use StatSolve Pro with confidence that your data remains on your device throughout the entire process, including PDF generation.

8. Children's Privacy

StatSolve Pro is an educational tool designed for use by individuals of all ages, including minors in educational settings. Because we collect no personal information from regular use of the application, there is no personal data of children to protect in the ordinary course of using the calculators. Parents, guardians, and educational institutions can permit children to use StatSolve Pro with confidence that no personal information is collected from students' calculator usage.

If a minor submits a contact form inquiry, that submission is handled with the same care as any adult inquiry. We do not knowingly retain contact information from children under 13 and will delete any such information if it comes to our attention.

9. Your Rights and Our Obligations

Because StatSolve Pro collects virtually no personal data, exercising data rights under regulations like GDPR, CCPA, or similar frameworks is straightforward. If you have submitted a contact form and wish to request deletion of that communication record, simply contact us and we will process that request promptly. There is no account to delete, no usage profile to erase, and no calculation history to purge — because none of these things exist.

We are committed to complying with applicable data protection laws and to maintaining these minimal-collection practices as the application evolves. Any future changes to our data practices that involve increased data collection will be accompanied by prominent notification and updated terms.

10. Changes to This Policy

We may update this Privacy Policy periodically to reflect changes in our practices, new features, or evolving legal requirements. The "Last updated" date at the top of this page indicates when the most recent revision was made. We encourage you to review this policy periodically. For material changes — changes that meaningfully affect how we collect or use information — we will make reasonable efforts to provide notice through prominent placement on the application interface.

Continued use of StatSolve Pro after a policy update constitutes your acceptance of the revised terms. If you have concerns about any proposed changes, please contact us before they take effect.

11. Contact Information

If you have questions, concerns, or requests relating to this Privacy Policy or StatSolve Pro's data practices, please use our . We take all privacy inquiries seriously and aim to respond within 48 hours.

Terms and Conditions
Last updated: January 2025 · Please read carefully before using StatSolve Pro
1. Acceptance of Terms

By accessing, browsing, or using StatSolve Pro (available at statsolvepro.com), you acknowledge that you have read, understood, and agree to be bound by these Terms and Conditions in their entirety. These terms constitute a legally binding agreement between you (the "User") and StatSolve Pro (the "Service"). If you do not agree to any part of these terms, you must immediately discontinue your use of the Service.

We reserve the right to update these Terms and Conditions at any time. Updates will be reflected by the "Last updated" date at the top of this page. Your continued use of StatSolve Pro after any update constitutes your acceptance of the revised terms. We encourage you to review this page periodically to stay informed of any changes.

2. Description of Service

StatSolve Pro is a free, browser-based statistical calculator and educational tool providing 50 professional-grade statistical calculators with step-by-step solutions, interactive visualisations, and PDF export functionality. The Service is designed for educational, academic, and informational purposes. All calculations are performed locally within your web browser using JavaScript; no data is transmitted to external servers during normal calculator use.

The Service is provided at no cost and does not require user registration or account creation. Access to the Service may be subject to availability and may be temporarily interrupted for maintenance, updates, or circumstances beyond our reasonable control.

3. Permitted Use

You are granted a non-exclusive, non-transferable, revocable licence to use StatSolve Pro for the following purposes:

This licence does not grant you any right to sublicense, sell, resell, transfer, assign, or otherwise commercially exploit the Service or any content thereof. You may not frame or mirror any content from StatSolve Pro on any other server or device without our prior written consent.

4. Prohibited Activities

You agree not to engage in any of the following activities while using StatSolve Pro:

5. Accuracy, Reliability, and Educational Nature

StatSolve Pro is designed to produce accurate results based on standard statistical formulas and widely accepted numerical methods. However, the Service is provided strictly for educational and informational purposes and is offered "as is" without warranty of any kind, either express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, or non-infringement.

Statistical calculations can be sensitive to input quality, rounding precision, and the assumptions underlying each test. Users are solely responsible for verifying the suitability and accuracy of any calculation for their specific use case. Do not rely solely on StatSolve Pro results for critical decisions in medical, clinical, legal, financial, or safety-critical contexts without independent verification from a qualified professional or peer-reviewed software package.

The step-by-step solutions provided are intended to illustrate the mathematical procedure behind each calculation. They are not a substitute for professional statistical consultation and may not account for all edge cases or methodological considerations relevant to your specific situation.

6. Intellectual Property Rights

The StatSolve Pro application — including its user interface design, visual elements, educational content, explanatory text, code architecture, and brand identity — is the intellectual property of StatSolve Pro and is protected under applicable copyright laws. All rights are reserved. Statistical formulas and mathematical methods used within the tools are in the public domain; however, the specific implementation, explanation, and presentation of those methods within this application are original works protected by copyright.

You may use PDF exports of your own calculations for personal, academic, or professional purposes, provided that you credit StatSolve Pro as the source. You may not reproduce substantial portions of the application's educational content, step-by-step explanation templates, or interface design without prior written permission.

7. Advertising

StatSolve Pro is supported by display advertising provided by third-party advertising networks, which may include Google AdSense and similar services. These advertisements help fund the continued development and free availability of the Service. By using StatSolve Pro, you acknowledge that advertisements may be displayed during your session. We do not control the specific content of third-party advertisements, and their appearance does not constitute endorsement by StatSolve Pro of any advertised product, service, or organisation. Third-party advertisers may use cookies in accordance with their own privacy policies.

8. Limitation of Liability

To the fullest extent permitted by applicable law, StatSolve Pro, its creators, contributors, and associates shall not be liable for any direct, indirect, incidental, special, consequential, or punitive damages arising from or related to your use of, or inability to use, the Service. This includes but is not limited to damages arising from reliance on any calculation result, errors or omissions in content, service interruptions, or data loss. Some jurisdictions do not allow the exclusion or limitation of liability for consequential or incidental damages, so the above limitation may not apply to you in its entirety.

9. Modifications to the Service

We reserve the right to modify, suspend, or discontinue any aspect of StatSolve Pro at any time, with or without notice. This includes adding or removing tools, changing the user interface, or altering how results are presented. We will not be liable to you or any third party for any modification, suspension, or discontinuation of the Service.

10. Governing Law and Contact

These Terms and Conditions shall be governed by and construed in accordance with applicable laws. Any disputes arising under or in connection with these Terms shall first be addressed through good-faith communication via our contact channels before any formal legal proceedings are initiated. Questions may be directed to statsolveprohelp@gmail.com or via the .

Disclaimer
Last updated: January 2025 · Important limitations and notices regarding StatSolve Pro
1. Educational and Informational Purpose Only

StatSolve Pro is designed and operated exclusively as an educational and informational tool. Every calculator, step-by-step solution, formula explanation, interpretation guide, and piece of educational content on this website is provided solely for the purpose of helping users understand statistical concepts and verify their own manual calculations. Nothing on this website constitutes professional statistical advice, medical advice, legal advice, financial advice, engineering guidance, or any other form of expert professional consultation.

While we have made every reasonable effort to ensure that the mathematical methods implemented in StatSolve Pro are accurate and consistent with standard statistical practice, users must understand that no automated tool — regardless of its sophistication — can replace the judgement of a trained statistician, data scientist, or domain expert who understands the full context of a real-world analysis. Statistical methods require careful consideration of study design, data quality, assumption validation, and contextual interpretation. StatSolve Pro can assist with the computation, but it cannot replace the expertise required to design a study correctly, choose the most appropriate method for your data, or interpret results responsibly.

2. No Warranty on Calculation Accuracy

StatSolve Pro is provided "as is" without any warranty, express or implied. Although we test our calculators carefully and strive for numerical accuracy consistent with standard textbook and software implementations, we cannot guarantee that all results are free from errors in every possible scenario. Factors that may affect accuracy include, but are not limited to: extreme input values outside the typical range of statistical practice, numerical precision limitations inherent in browser-based JavaScript execution, highly unusual data distributions, or edge cases that differ from standard assumptions.

Users who discover a discrepancy between StatSolve Pro's output and results from verified statistical software (such as R, SPSS, SAS, or Stata) are encouraged to report the issue via our contact form at statsolveprohelp@gmail.com. We take accuracy reports seriously and investigate every one.

For results used in academic publications, clinical trials, legal proceedings, financial decisions, or any high-stakes context, you must independently verify your calculations using peer-reviewed statistical software and, where appropriate, consult a qualified statistician.

3. Not a Substitute for Professional Advice

The results, interpretations, and educational commentary provided by StatSolve Pro should never be used as the sole basis for decisions in the following contexts:

4. Third-Party Advertising Disclaimer

StatSolve Pro displays third-party advertisements to fund the ongoing development and free operation of this Service. These advertisements are served by advertising networks including, but not limited to, Google AdSense. StatSolve Pro does not endorse, recommend, or take responsibility for any product, service, company, or content advertised through these third-party networks. The appearance of an advertisement on StatSolve Pro does not constitute any form of endorsement or affiliation.

Third-party advertisers operate under their own terms of service and privacy policies. StatSolve Pro does not control the content of advertisements displayed to you and cannot be held responsible for any advertised claims, products, or services. If you encounter an advertisement that you believe is misleading, inappropriate, or harmful, we encourage you to report it directly to the advertising network in question. You may also notify us at statsolveprohelp@gmail.com.

5. External Links Disclaimer

StatSolve Pro may contain links to external websites, academic resources, or reference materials for educational purposes. These external links are provided as a convenience to users and do not constitute endorsement by StatSolve Pro of the linked website, its content, its operators, or any products or services offered therein. We have no control over the content of external sites and accept no responsibility for the accuracy, legality, or appropriateness of their content. Accessing any external link is done entirely at your own risk.

6. Limitation of Liability

To the maximum extent permitted by applicable law, StatSolve Pro, its creators, maintainers, and contributors expressly disclaim all liability for any loss, damage, cost, or expense of any nature whatsoever incurred or suffered by you or any third party as a direct or indirect result of: reliance on any statistical result produced by this Service; misinterpretation of any educational content; decisions made on the basis of outputs from this Service; or any technical failure, interruption, or inaccuracy of the Service.

This disclaimer applies regardless of whether such loss or damage was foreseeable and regardless of whether StatSolve Pro has been advised of the possibility of such loss or damage.

7. Changes to This Disclaimer

We may update this Disclaimer from time to time to reflect changes in the Service, applicable laws, or our operational practices. The "Last updated" date at the top of this page indicates the date of the most recent revision. We encourage you to review this Disclaimer periodically. Continued use of StatSolve Pro following any update constitutes your acceptance of the revised Disclaimer.

If you have any questions about this Disclaimer or any other legal document on this website, please contact us at statsolveprohelp@gmail.com or use the .

📝 Free Statistics Articles

Statistics Articles & Guides

22 free, in-depth articles on statistics concepts, methods, and data analysis. From sampling to regression — all explained clearly.

Sampling Methods Types of Sampling Methods in Statistics Probability, stratified, cluster, systematic, and convenience sampling — complete guide with examples. ⏱ 6 min read → Statistical Concepts What is a P-Value? Explained Simply Plain English explanation of p-values, p < 0.05, and the 5 most common misconceptions. ⏱ 5 min read → Statistical Concepts Descriptive vs Inferential Statistics Key differences, side-by-side comparison, and real-world examples of both types. ⏱ 5 min read → Distributions Normal Distribution Explained Bell curve properties, empirical rule, standardisation, and why normality appears everywhere. ⏱ 6 min read → Statistical Methods Hypothesis Testing — Step-by-Step Guide Complete 6-step guide with worked example, test selection table, and Type I/II error explanation. ⏱ 7 min read → Data Analysis 10 Essential Data Analysis Techniques Descriptive stats, regression, A/B testing, time series, clustering and more — with examples. ⏱ 8 min read → Statistical Concepts What is Standard Deviation? Intuitive explanation, formula, step-by-step example, and real-world applications of SD. ⏱ 5 min read → Statistical Methods Type I and Type II Errors False positives, false negatives, statistical power, and how to balance both error types. ⏱ 5 min read → Statistical Concepts Correlation vs Causation Classic examples, confounding variables, and how to establish causation in research. ⏱ 5 min read → Statistical Methods ANOVA Explained When and how to use analysis of variance, the ANOVA table, and post-hoc tests. ⏱ 7 min read → Distributions Probability Distributions Guide Normal, binomial, Poisson, t, F, chi-square, exponential — when to use each distribution. ⏱ 7 min read → Regression Linear Regression — Complete Guide Equation, assumptions, R² interpretation, residuals, and when to use regression. ⏱ 7 min read → Descriptive Statistics Mean vs Median vs Mode When to use each measure of central tendency and how skewness affects the choice. ⏱ 5 min read → Statistical Methods Chi-Square Test Explained Goodness of fit vs independence test, assumptions, effect size (Cramér's V), and interpretation. ⏱ 6 min read → Research Methods How to Determine Sample Size Formulas for means, proportions, and two-sample tests with worked examples. ⏱ 6 min read → Data Analysis Outlier Detection Methods IQR fence, z-score, modified z-score, and Grubbs' test — when to remove outliers. ⏱ 6 min read → Statistical Inference Confidence Intervals Explained Correct interpretation, common misconceptions, CI width, and CI vs p-values. ⏱ 5 min read → Statistical Concepts Bayesian vs Frequentist Statistics Core philosophical difference, credible vs confidence intervals, when to use each. ⏱ 6 min read → Statistical Concepts Statistical vs Practical Significance Why p < 0.05 doesn't mean important. Effect sizes, Cohen's d, and R². ⏱ 5 min read → Data Analysis Data Visualization in Statistics Histogram, box plot, scatter plot, bar chart — which chart to use when. ⏱ 6 min read → Sampling Methods Stratified vs Cluster Sampling Key differences, when to use each, precision comparison, and multistage sampling. ⏱ 5 min read → Statistical Methods T-Test Complete Guide One-sample, two-sample, and paired t-tests — formulas, examples, and when to use each. ⏱ 7 min read →
🔬 Upgraded Data Analysis Tool v2

Full Statistical Analysis in Your Browser

Upload any CSV or choose from 12 built-in datasets. Get descriptive stats, scatter plots, box plots, hypothesis tests, correlation matrix, group analysis, and more — all instantly.

📊 Descriptive Stats 🔍 Scatter & Box Plots 🔗 Correlation Matrix 🧪 Hypothesis Tests 👥 Group Analysis 📐 Percentile Finder 🎯 Outlier Detection 📈 Normality (Q-Q Plot) 12 Sample Datasets 📥 CSV Export
📂 Load Your Data
Upload a CSV, paste data, or explore one of 12 built-in datasets. First row = column headers.
Analysing your data...
📖 About This Tool Full Feature List

📊 Analysis Features

  • Descriptive Statistics — Mean, median, mode, SD, variance, IQR, skewness, kurtosis, CV, SE, CI
  • Distribution Histograms — Adjustable bin count (8/12/16/20), all numeric columns
  • Scatter Plot — XY analysis with Pearson regression line, r, R², colour grouping
  • Box Plots — Five-number summary visualisation for multiple columns
  • Correlation Matrix — Full Pearson r heatmap + ranked correlation table
  • Hypothesis Testing — One-sample t-test (all directions) + Chi-square goodness of fit
  • Group Analysis — Group by any categorical column, aggregate by mean/median/sum/count
  • Outlier Detection — IQR fence + Z-score method, shows exact values
  • Q-Q Plot — Normality assessment with quantile-quantile plot
  • Percentile Finder — Find value at any percentile, full percentile table
  • Frequency Distribution — Table + bar chart for any column
  • Data View — Preview and filter your data rows

🎲 Built-in Sample Datasets

  • Student Performance — 15 students, 6 subjects, age, study hours
  • Health & BMI — 15 patients, vitals, cholesterol, blood sugar
  • Sales Data — 12 months, revenue, units, marketing spend
  • Survey Results — 15 respondents, Likert scale ratings
  • Iris Flowers — Classic 150-row ML dataset, 4 measurements + species
  • World Countries — GDP, population, area, HDI by region
  • Weather & Climate — Monthly temperature, rainfall, humidity
  • Employee HR — Salary, experience, department, performance
  • E-Commerce — Orders, revenue, items, discount, rating
  • NBA Players — Points, assists, rebounds, salary, age
  • Nutrition Facts — Calories, protein, fat, carbs, fibre
  • Stock Returns — Monthly returns, volatility, beta

🔒 Privacy

All analysis runs 100% in your browser. Your data is never sent to any server. Safe for confidential or sensitive datasets.

FILTERS:
📊
Visualization Studio
Load your data, then add charts from the Charts tab. Create stunning dashboards in seconds.
12 Chart Types 12 Sample Datasets 4 Themes CSV Upload Auto Insights
🎓
Student Data
Demo with grades
💰
Sales Data
Revenue & trends
🌍
World Data
GDP & population
50
Statistical Tools
200+
Step-by-Step Solutions
15+
Interactive Charts
100%
Free & Private
Everything You Need to Solve Statistics
Professional-grade tools with educational content — perfect for students, researchers, and analysts.
🔢
Step-by-Step Solutions
Every calculation shows its complete working — formulas substituted, values computed, decisions explained. Learn while you solve.
📈
Interactive Charts
Visualize distributions, scatter plots, histograms, and probability curves. Switch chart types, see data come alive.
📄
PDF Export
Export any solution as a clean, print-ready PDF. Include steps, charts, and formulas. Perfect for homework and reports.
📖
Built-in Tutorials
Every tool includes "What is this?" — formula references, when-to-use guides, worked examples, and interpretation tips.
🔒
100% Private
All calculations run in your browser. Your data never leaves your device. No accounts, no tracking, no ads.
Load Sample Data
Every tool has sample data built in. One click to load realistic examples and see how the calculator works immediately.
6 Categories, 50 Calculators
From basic descriptive statistics to advanced regression and hypothesis testing.
📊
Descriptive
9 tools
🎲
Probability
4 tools
📉
Distribution
9 tools
🧪
Hypothesis
10 tools
📈
Regression
5 tools
🔬
Advanced
2 tools
Ready to Start?

Solve Your First Problem
in 30 Seconds

No sign-up. No download. No cost. Just open the calculator and start solving.

Complete Guide

Free Statistics Calculator — All 50 Tools Explained

A complete reference for every statistical tool in StatSolve Pro — formulas, when to use each test, and what results mean.

📊 Descriptive Statistics Calculators

Mean Median Mode Calculator
Computes arithmetic mean (x̄ = Σx/n), all modes, sample and population median, variance, standard deviation, coefficient of variation, range, IQR, Q1 and Q3 with full step-by-step formula working.
Standard Deviation Calculator
Calculates both sample standard deviation s = √[Σ(xᵢ−x̄)²/(n−1)] and population standard deviation σ = √[Σ(xᵢ−μ)²/n]. Shows each deviation squared, the sum, and the final result.
Five Number Summary & IQR
Min, Q1, Median, Q3, Max with interquartile range (IQR = Q3−Q1). Includes lower fence (Q1−1.5×IQR) and upper fence (Q3+1.5×IQR) for outlier identification.
Outlier Detection Calculator
Identifies outliers using both the IQR fence method and Z-score method (|z| > 3). Lists all suspected and confirmed outliers with their positions.
Frequency Distribution Builder
Creates complete frequency tables with absolute frequency, relative frequency, cumulative frequency, and cumulative relative frequency for both raw and grouped data.

🎲 Probability Calculators

Basic Probability Calculator
Computes complement P(A') = 1−P(A), union P(A∪B) = P(A)+P(B)−P(A∩B), intersection (independent: P(A)×P(B)), and conditional probability P(A|B) = P(A∩B)/P(B).
Bayes Theorem Calculator
P(H|E) = P(E|H)×P(H) / P(E). Enter prior probability, sensitivity (true positive rate), and specificity (true negative rate). Shows posterior probability with complete Bayes table.
Permutations and Combinations
nPr = n!/(n−r)! and nCr = n!/[r!(n−r)!]. Supports with-replacement and without-replacement scenarios. Shows full factorial expansion.

📉 Distribution Calculators

Normal Distribution Calculator
Compute P(a ≤ X ≤ b), left-tail, right-tail, and two-tail probabilities for any normal distribution N(μ, σ²). Shows standardization z = (x−μ)/σ and shaded area curve.
Binomial Distribution Calculator
P(X=k) = C(n,k)·pᵏ·(1−p)ⁿ⁻ᵏ. Computes exact, cumulative ≤k, and cumulative ≥k probabilities. Mean = np, Variance = npq. Bar chart shows full probability mass function.
Poisson Distribution Calculator
P(X=k) = (λᵏ·e⁻λ)/k! for rare events. Mean = λ, Variance = λ. Cumulative probabilities and full PMF chart. Ideal for call centers, defect rates, accident counts.
Z-Score Calculator
z = (x−μ)/σ. Converts raw scores to z-scores and vice versa. Shows percentile rank, cumulative probability, and position on standard normal curve.

🧪 Hypothesis Testing Calculators

T-Test Calculator (One-Sample)
t = (x̄ − μ₀) / (s/√n). Tests whether sample mean equals a hypothesized population mean. Shows t-statistic, degrees of freedom, p-value, critical value, and reject/fail-to-reject decision at α = 0.05 (or your chosen level).
Two-Sample T-Test Calculator
Welch's t-test for comparing two independent group means when variances may be unequal. Computes pooled or Welch-Satterthwaite degrees of freedom, t-statistic, and p-value.
Paired T-Test Calculator
t = d̄/(s_d/√n) for matched pairs or before/after measurements. Computes the mean difference, its standard deviation, t-statistic, and p-value. More powerful than two-sample t-test for paired data.
Z-Test Calculator
z = (x̄ − μ₀) / (σ/√n) when population standard deviation σ is known. Supports two-tailed, left-tailed, and right-tailed alternatives.
Chi-Square Test Calculator
χ² = Σ(O−E)²/E for goodness of fit. Enter observed and expected frequencies. Shows contribution of each category, degrees of freedom, χ²-statistic, and critical value at chosen α.
ANOVA Calculator (One-Way)
One-way ANOVA with complete table: SSB, SSW, SST, df_between, df_within, MSB, MSW, F-statistic, and p-value. Tests equality of 3+ group means simultaneously.
Confidence Interval Calculator
CI = x̄ ± (z* or t*)·(s/√n). Calculates CI for population mean (using z or t depending on whether σ is known), and for proportions. Supports 90%, 95%, and 99% confidence levels.
Sample Size Calculator
n = (z·σ/E)² for estimating a mean; n = z²·p·(1−p)/E² for a proportion. Computes required sample size for given margin of error, confidence level, and standard deviation or proportion estimate.

📈 Regression & Correlation Calculators

Linear Regression Calculator
ŷ = b₀ + b₁x. Computes slope b₁ = Σ(xᵢ−x̄)(yᵢ−ȳ)/Σ(xᵢ−x̄)², intercept b₀ = ȳ − b₁x̄, correlation r, R², standard error of estimate, and prediction for any X. Scatter plot with regression line.
Pearson Correlation Calculator
r = Σ(xᵢ−x̄)(yᵢ−ȳ) / √[Σ(xᵢ−x̄)²·Σ(yᵢ−ȳ)²]. Computes r, r², t-statistic, and p-value to test H₀: ρ = 0. Interprets correlation strength and direction.
Spearman Rank Correlation
rₛ = 1 − 6Σd²/[n(n²−1)]. Non-parametric alternative to Pearson correlation. Ranks raw data, computes rank differences, and tests significance. Robust to outliers and non-normal data.
Covariance Calculator
Sample Cov(X,Y) = Σ(xᵢ−x̄)(yᵢ−ȳ)/(n−1) and population covariance. Shows direction (positive/negative) and relationship to correlation and variance.

Why Use a Free Statistics Calculator with Step-by-Step Solutions?

Standard statistics software like SPSS, SAS, Minitab, or R gives you the answer — but often not the reasoning. When you're a student learning hypothesis testing, or a researcher double-checking a confidence interval, seeing only the final p-value isn't enough. You need to follow the calculation from first principles to build genuine understanding and catch errors.

StatSolve Pro's free statistics calculator shows every step: how the t-statistic was computed from your data, how the degrees of freedom were derived, how the p-value was obtained from the t-distribution, and exactly what the decision rule says. This means you can verify your work, learn the method, and explain your results — not just copy a number.

All 50 tools are completely free, require no account or subscription, run entirely in your browser (your data never leaves your device), and work on any device including smartphones. The built-in interactive charts — normal distribution curves, binomial probability bars, regression scatter plots, ANOVA comparison charts — make abstract statistical concepts visually intuitive.

Which Statistical Test Should I Use? — Quick Reference

Research Question Test to Use Key Requirement
Does my sample mean equal a specific value? H₀: μ = μ₀ when σ known — z-statistic, p-value, decision σ unknown (almost always)
Do two independent groups have the same mean? Two-Sample T-Test Independent groups, approx. normal
Do 3+ groups have the same mean? One-Way ANOVA Normal, equal variance
Before/after treatment in the same subjects? Paired T-Test Matched pairs of measurements
Does a proportion equal a target value? One-Proportion Z-Test n·p ≥ 10 and n·(1−p) ≥ 10
Are two categorical variables independent? Chi-Square Test Expected frequencies ≥ 5 per cell
Is there a linear relationship between X and Y? Linear Regression / Pearson r Continuous variables, linear pattern
How large a sample do I need? Sample Size Calculator Specify margin of error and confidence
41 statistical calculators · step-by-step solutions
Descriptive
Probability
Distribution
Hypothesis
Regression
Advanced
Descriptive
📊

Descriptive Statistics

Mean, median, mode, variance, std dev, range, Q1/Q3, IQR, CV

1 / 39
📥 Input Data
📐 Formula
Meanx̄ = Σx / n
Variances² = Σ(x − x̄)² / (n−1)
Std Devs = √s²
IQRIQR = Q3 − Q1
CVCV = (s / x̄) × 100%
🎯 When to Use
Use descriptive statistics any time you want to summarize a dataset before deeper analysis. It's the first step in every data project — understanding the center, spread, and shape of your data. Descriptive statistics are the bedrock of any data analysis. You use this tool when you have a raw dataset and need to transform it into a meaningful story. It is essential in the "Exploratory Data Analysis" (EDA) phase. Use it to identify the "typical" value (Central Tendency), the "spread" of the data (Dispersion), and the "shape" of the distribution (Skewness and Kurtosis). It is used in business to summarize monthly sales, in healthcare to look at patient vitals, and in manufacturing to monitor quality control.
💡 Example
1.Test scores: 72, 85, 90, 68, 95, 88, 76, 82
→ n=8, Mean=82, Median=85, SD≈9.0, Range=27, IQR=14 2. Imagine a real estate agent analyzing the sale prices of 10 homes in a specific neighborhood to help a seller set a listing price. Data (in thousands): $350, $365, $370, $380, $380, $390, $410, $420, $450, $750. Mean: $426.5k Median: $385k Standard Deviation: $117.8k
🔍 Interpretation
Mean tells you the typical value. Standard deviation shows how spread out the data is. A high CV (>30%) means the data is very variable relative to its average. Use median instead of mean when data is skewed. Second one The interpretation reveals a "Right Skew." Notice the Mean ($426.5k) is much higher than the Median ($385k). This is caused by the $750k "outlier." If you told the seller the "average" price is $426k, they might overprice their home and fail to sell. The Median is a better representation of the "typical" home here. The high Standard Deviation tells you there is high volatility in home values in this area, suggesting that home features (lot size, renovations) vary significantly.
Descriptive
📋

Frequency Distribution

Build a complete frequency table with relative & cumulative frequencies

2 / 39
📥 Input Data
📐 Formula
Frequencyf = count of each value
Relative Freqrf = f / n
Cumulative FreqCF = Σ f up to that value
Cum. Rel. FreqCRF = CF / n
🎯 When to Use
Use when you want to see how often each value appears. Essential before drawing bar charts or histograms. Helps identify the mode and the distribution shape quickly. Use this when you have a large amount of data and need to categorize it to see patterns. It is particularly useful for categorical data (e.g., "What color car do people prefer?") or discrete numerical data (e.g., "How many children are in each household?"). It turns a messy list of numbers into a structured table or histogram.
💡 Example
1.Survey responses (1–5): 1,2,2,3,3,3,4,4,5
→ Value 3 appears 3 times → f=3, rf=3/9=33.3% 2. A restaurant owner tracks customer ratings (1 to 5 stars) over a weekend with 100 reviews. 1 Star: 5 2 Stars: 10 3 Stars: 15 4 Stars: 50 5 Stars: 20
🔍 Interpretation
Relative frequency shows the proportion of each value. Cumulative frequency answers 'what percent scored at most X?' High relative frequency at one value means a strong mode there. The Relative Frequency shows that 50% of customers gave a 4-star rating. The Cumulative Frequency shows that 30% of customers gave a 3-star rating or lower (5+10+15). This suggests that while the majority are satisfied, nearly a third of the customer base feels the experience is "average" or "poor," providing a clear target for service improvement.
Descriptive
📦

Grouped Data

Mean, median, mode & SD from class interval frequency tables

3 / 39
📥 Class Intervals & Frequencies
Format: lower-upper, frequency — one per line. Example: 10-20, 5
📐 Formula
Midpointm = (lower + upper) / 2
Meanx̄ = Σ(f·m) / n
MedianL + [(n/2 − CF) / f] × h
ModeL + [d₁/(d₁+d₂)] × h
Variances² = Σf(m − x̄)² / (n−1)
🎯 When to Use
Use when raw data is unavailable and only class intervals with frequencies are given. Common in surveys, census data, and published reports. This is the ultimate tool for comparing different groups. Use it when you want to see the Minimum, 1st Quartile (25th percentile), Median (50th), 3rd Quartile (75th), and Maximum. It is the best way to visualize "Spread" and "Outliers" simultaneously.
💡 Example
Heights: 150–160 (5), 160–170 (12), 170–180 (8)
→ Mean ≈ 165.8, Median ≈ 165.8, Modal class: 160–170 2. A logistics company compares the delivery times of two different routes (Route A vs. Route B). Route A: Min: 20m, Q1: 25m, Median: 30m, Q3: 35m, Max: 60m. Route B: Min: 28m, Q1: 29m, Median: 30m, Q3: 31m, Max: 33m.
🔍 Interpretation
All results are estimates because we assume data is evenly spread within each class. The modal class is the interval with highest frequency, not an exact value. Wider classes give less precise estimates. Even though both routes have the same Median (30 mins), Route B is much more "reliable." The Interquartile Range (IQR) for Route B is only 2 minutes (31-29), whereas Route A has an IQR of 10 minutes. Furthermore, Route A has a "Maximum" of 60 minutes, which is a potential outlier. A business would choose Route B because consistency is often more valuable than a lower minimum time.
Descriptive
📐

Five Number Summary

Min, Q1, Median, Q3, Max, IQR and box plot fence boundaries

4 / 39
📥 Input Data
📐 Formula
Minsmallest value
Q1 (25th)median of lower half
Median (Q2)middle value
Q3 (75th)median of upper half
Maxlargest value
IQRQ3 − Q1
🎯 When to Use
Use to create a box plot and quickly understand the full spread of data. Especially useful for comparing multiple datasets or when outliers may be present.
💡 Example
Data: 3, 7, 8, 10, 12, 14, 18
→ Min=3, Q1=7, Median=10, Q3=14, Max=18, IQR=7
🔍 Interpretation
The box (Q1 to Q3) contains the middle 50% of data. The IQR measures that middle spread. Whiskers extend to Min/Max (or fence limits). A long upper whisker suggests right skew.
Descriptive
🔍

Outlier Detection

Identify outliers using IQR fence method and Z-score method

5 / 39
📥 Input Data
📐 Formula
Lower FenceQ1 − 1.5 × IQR
Upper FenceQ3 + 1.5 × IQR
Z-Scorez = (x − x̄) / s
Outlier (Z)|z| > 2 or |z| > 3
🎯 When to Use
Use when you suspect unusual values in your dataset. IQR method is better for skewed data. Z-score method assumes near-normality. Always investigate outliers — they may be errors or real extreme events.
💡 Example
Data: 10, 12, 11, 13, 10, 82
→ Q1=10, Q3=13, IQR=3, Upper fence=17.5 → 82 is an outlier
🔍 Interpretation
An outlier is not automatically wrong — it could be the most important data point. Always ask: data entry error? different population? legitimate extreme case? Remove only if you have a justified reason.
Descriptive
✖️

Geometric & Harmonic Mean

AM, GM, HM means — verify AM ≥ GM ≥ HM inequality

6 / 39
📥 Input Data (all positive)
📐 Formula
Arithmetic MeanAM = Σx / n
Geometric MeanGM = (x₁·x₂·…·xₙ)^(1/n)
Harmonic MeanHM = n / Σ(1/xᵢ)
InequalityAM ≥ GM ≥ HM (always)
🎯 When to Use
GM: use for growth rates, ratios, percentages (e.g. investment returns, population growth). HM: use when averaging rates or speeds (e.g. average speed over equal distances). Use AM for most everyday data.
💡 Example
Investment returns: +10%, +20%, +30% (factors: 1.1, 1.2, 1.3)
→ GM = (1.1×1.2×1.3)^(1/3) = 1.1972 → avg growth ≈ 19.7%/year
🔍 Interpretation
GM is always less than or equal to AM. The bigger the spread in data, the bigger the gap. GM gives a more accurate picture of compound growth than AM. HM is dominated by small values.
Descriptive
⚖️

Weighted Mean

Weighted average x̄w = Σwx / Σw with weight breakdown

7 / 39
📥 Values & Weights
📐 Formula
Weighted Meanx̄w = Σ(w·x) / Σw
Equal weights→ reduces to simple mean
🎯 When to Use
Use when different values have different importance (weights). Examples: GPA (credits as weights), overall exam score (chapters have different point values), investment portfolio average return.
💡 Example
Exam grades: 80 (wt 0.20), 75 (wt 0.30), 90 (wt 0.50)
→ x̄w = (0.20×80 + 0.30×75 + 0.50×90) / 1.0 = 16+22.5+45 = 83.5
🔍 Interpretation
The weighted mean shifts toward the values with higher weights. If all weights are equal, it equals the simple mean. Weights don't have to sum to 1 — the formula divides by Σw automatically.
Descriptive
📈

Moving Average

Simple Moving Average (SMA) with trend direction analysis

8 / 39
📥 Time Series Data
📐 Formula
SMASMAₜ = (xₜ + xₜ₋₁ + ... + xₜ₋ₖ₊₁) / k
LagSMA lags behind actual data by k/2 periods
🎯 When to Use
Use to smooth out short-term fluctuations and reveal underlying trends in time series data. Common in stock analysis, sales forecasting, weather data, and economic indicators.
💡 Example
Monthly sales: 100, 120, 110, 130, 140
k=3: SMA₃ = (100+120+110)/3=110, (120+110+130)/3=120, (110+130+140)/3=126.7
🔍 Interpretation
Larger k = smoother line, more lag. Smaller k = more responsive, noisier. When actual values cross the SMA line from below, it may signal an upward trend — often used as a buy signal in trading.
Descriptive
🏆

Rank & Percentile

Percentile rank of a value or find value at given percentile

9 / 39
📥 Data & Query
📐 Formula
Percentile RankPR = [(# below x) + 0.5×(# equal to x)] / n × 100
Value at PL = (P/100) × n; use Lth (or interpolate)
🎯 When to Use
Use to locate a value relative to the rest of the dataset. Standardized tests (SAT, GRE), growth charts, and performance rankings all use percentiles.
💡 Example
Scores: 55, 60, 65, 70, 75, 80, 85, 90, 95, 100
→ Score 80 → 7 below, 1 equal → PR = (7+0.5)/10×100 = 75th percentile
🔍 Interpretation
P75 means you scored better than 75% of the group. The 50th percentile = median. Percentile rank depends on the group — a 90th percentile in one group may be 60th in another.
Probability
🎲

Basic Probability

Union, intersection, conditional probability and complement

10 / 39
📥 Event Probabilities
📐 Formula
ComplementP(A') = 1 − P(A)
Addition RuleP(A∪B) = P(A) + P(B) − P(A∩B)
MultiplicationP(A∩B) = P(A)·P(B|A)
ConditionalP(A|B) = P(A∩B) / P(B)
IndependenceP(A∩B) = P(A)·P(B)
🎯 When to Use
Use whenever you need to calculate the likelihood of events. Foundations for statistics, machine learning, risk analysis, and everyday decision-making.
💡 Example
Drawing cards: P(Red)=0.5, P(Face)=0.231, P(Red Face)=0.115
→ P(Red OR Face) = 0.5 + 0.231 − 0.115 = 0.616
🔍 Interpretation
Probabilities always range from 0 to 1. Mutually exclusive events can't happen together — P(A∩B)=0. Independent events don't affect each other. The complement rule is your best friend when 'at least one' appears in the problem.
Probability
🔄

Bayes' Theorem

Posterior probability P(A|B) from prior, likelihood & false positive

11 / 39
📥 Bayes Parameters
📐 Formula
BayesP(A|B) = P(B|A)·P(A) / P(B)
Total ProbP(B) = P(B|A)·P(A) + P(B|A')·P(A')
🎯 When to Use
Use to update a probability when new evidence arrives. Medical diagnosis, spam filters, search & rescue, machine learning classifiers — all use Bayes' theorem. Use Bayes' Theorem when you need to calculate "Conditional Probability"—the likelihood of an event occurring given that another event has already occurred. This is critical in medical testing, legal proceedings (evidence), and machine learning (spam filters). It allows you to update your beliefs as new data comes in.
💡 Example
Disease affects 1% of population. Test: 95% true positive, 8% false positive.
→ Positive test result → only 10.7% chance you actually have the disease! 2. Use Bayes' Theorem when you need to calculate "Conditional Probability"—the likelihood of an event occurring given that another event has already occurred. This is critical in medical testing, legal proceedings (evidence), and machine learning (spam filters). It allows you to update your beliefs as new data comes in.
🔍 Interpretation
The result is often counterintuitive. Even with a very accurate test, a rare disease means most positives are false. This is the base rate fallacy — the prior probability matters enormously. Always consider the prevalence before interpreting a test result. Most people intuitively guess 95%. However, using Bayes’ Theorem: $P(A|B) = \frac{P(B|A) \times P(A)}{P(B)}$ The actual probability is only about 16%. The interpretation is the "Base Rate Fallacy." Because the disease is so rare (1%), the number of "false positives" from the healthy 99% of the population overwhelms the "true positives" from the sick 1%. This explains why doctors often order a second, different test after a first positive result.
Probability
🔢

Permutation & Combination

P(n,r) = n!/(n-r)! and C(n,r) = n!/[r!(n-r)!] with steps

12 / 39
📥 Parameters
📐 Formula
PermutationP(n,r) = n! / (n−r)! [ORDER matters]
CombinationC(n,r) = n! / [r!(n−r)!] [order doesn't matter]
RelationP(n,r) = r! × C(n,r)
🎯 When to Use
Permutation: order matters (passwords, race rankings, seating arrangements). Combination: order doesn't matter (lottery numbers, choosing a team, selecting items).
💡 Example
Choosing 3 officers (Pres/VP/Sec) from 10 people → P(10,3) = 720
Choosing any 3 people from 10 for a committee → C(10,3) = 120
🔍 Interpretation
Every combination corresponds to r! permutations (all orderings of the same group). Combinations are always ≤ permutations. When 'how many ways' is asked — ask yourself: does the order matter?
Probability
🎯

Expected Value

E(X), E(X²), Var(X), SD(X) for discrete probability distributions

13 / 39
📥 Probability Distribution
Enter each outcome on a new line: value, probability
📐 Formula
E(X)E(X) = Σ [x · P(x)]
Var(X)Var(X) = E(X²) − [E(X)]²
SD(X)SD(X) = √Var(X)
🎯 When to Use
Use for discrete probability distributions. Essential in gambling, insurance, economics, game theory — anywhere you need the 'long-run average outcome' of a random variable.
💡 Example
Lottery: Win $100 with P=0.01, Win $10 with P=0.05, Win $0 with P=0.94
→ E(X) = 100×0.01 + 10×0.05 + 0×0.94 = $1.50
🔍 Interpretation
E(X) is the long-run average — not necessarily a possible outcome itself. A fair game has E(X) = cost to play. If E(X) > cost → favorable. Variance shows how variable the outcomes are around that expected value.
Distribution
🪙

Binomial Distribution

P(X=k), P(X≤k), P(X≥k), mean np, variance npq

14 / 39
📥 Binomial Parameters
📐 Formula
PMFP(X=k) = C(n,k) · pᵏ · (1−p)ⁿ⁻ᵏ
Meanμ = np
Varianceσ² = np(1−p)
Std Devσ = √np(1−p)
🎯 When to Use
Use when: (1) fixed number of trials n, (2) each trial is success/failure, (3) constant probability p, (4) trials are independent. Examples: flipping coins, product defect rates, survey yes/no responses.
💡 Example
12 free throws, 35% success rate. P(exactly 4 baskets)?
→ P(X=4) = C(12,4)·0.35⁴·0.65⁸ = 495·0.015·0.032 ≈ 0.237 2.A quality control manager knows that 5% of lightbulbs produced in a factory are defective. If they pick 20 bulbs at random, what is the probability that exactly 2 are defective
🔍 Interpretation
The distribution is right-skewed when p < 0.5, left-skewed when p > 0.5, symmetric when p = 0.5. Larger n makes it more bell-shaped (approaches normal). Mean=np tells you the expected number of successes. The score of 95 is 2 standard deviations above the mean (z-score = 2). According to the Empirical Rule (68-95-99.7): 68% fall within 1 SD. 95% fall within 2 SD. This means only 2.5% of students (the top half of the remaining 5%) will score above a 95. This interpretation allows organizations to set "cutoff scores" that accurately target the top-performing percentage of a population.
Distribution

Poisson Distribution

P(X=k) = λᵏe⁻λ/k! for rare events, cumulative probabilities

15 / 39
📥 Poisson Parameters
📐 Formula
PMFP(X=k) = (λᵏ · e⁻λ) / k!
Meanμ = λ
Varianceσ² = λ (same as mean!)
🎯 When to Use
Use for rare events over a fixed interval of time, space, or area where events occur independently. Examples: calls per hour, defects per meter, accidents per month, emails per day.
💡 Example
Average 4.5 customers/hour. P(exactly 3 in next hour)?
→ P(X=3) = (4.5³ · e⁻⁴·⁵) / 3! = (91.125 × 0.0111) / 6 ≈ 0.169
🔍 Interpretation
The Poisson is a limiting case of Binomial when n is large and p is small. Mean = Variance is a diagnostic — if the sample variance is much larger than the mean, the data may be over-dispersed (use Negative Binomial instead).
Distribution
🔔

Normal Distribution

Z-score, P(Xx), percentile for any normal distribution

16 / 39
📥 Normal Distribution Parameters
📐 Formula
PDFf(x) = (1/σ√2π) · e^(−(x−μ)²/2σ²)
Z-scorez = (x − μ) / σ
Empirical68% within ±1σ, 95% within ±2σ, 99.7% within ±3σ
🎯 When to Use
Use when data is bell-shaped and symmetric. Height, weight, IQ, measurement errors, and many natural phenomena follow normal distributions. Also applies when n is large (Central Limit Theorem).
💡 Example
IQ scores: μ=100, σ=15. What % score above 130?
→ z = (130−100)/15 = 2.0 → P(Z>2) = 2.28%
🔍 Interpretation
The normal distribution is fully described by μ and σ. Moving 1σ above or below the mean changes the CDF by ~34%. The tails extend to ±∞ but contain very little probability beyond ±3σ.
Distribution
📏

Z-Score Calculator

Convert raw scores ↔ Z-scores, find percentile rank

17 / 39
📥 Z-Score Parameters
📐 Formula
Z-Scorez = (x − μ) / σ
Reversex = μ + z·σ
Interpretationz = 1 → 84th percentile; z = −1 → 16th percentile
🎯 When to Use
Use to standardize values from different scales so they can be compared. Used in comparing test scores across different tests, detecting outliers, and computing probabilities for normal distributions.
💡 Example
Math exam μ=70, σ=10. John scored 88.
→ z = (88−70)/10 = 1.8 → 96.4th percentile
🔍 Interpretation
Z-score tells you how many standard deviations above or below the mean a value falls. Negative z = below average. |z| > 2 is unusual (top/bottom 2.5%). Z-scores allow fair comparison: a z=1.5 on one exam equals a z=1.5 on another.
Distribution
🛡️

Chebyshev's Theorem

Min % within k·σ of mean for ANY distribution, no normality needed

18 / 39
📥 Chebyshev Parameters
📐 Formula
TheoremP(|X − μ| < k·σ) ≥ 1 − 1/k² (k > 1)
k=2At least 75% within μ ± 2σ
k=3At least 88.9% within μ ± 3σ
🎯 When to Use
Use when the distribution is unknown or non-normal. Chebyshev's theorem works for ANY distribution — no normality assumption needed. It gives conservative (minimum) guarantees.
💡 Example
Mean exam score = 72, SD = 10. At least what % score between 52 and 92?
→ k = 2, since 72 ± 2×10 = [52, 92] → at least 75%
🔍 Interpretation
The bounds are conservative — the true percentage within k·σ is usually much higher than the guarantee. For a normal distribution, k=2 gives 95.4% (vs Chebyshev's 75%). Use Chebyshev when normality is in doubt.
Distribution
🎰

Geometric Distribution

P(first success on trial k) = q^(k-1)·p, mean=1/p

19 / 39
📥 Geometric Distribution Parameters
📐 Formula
PMFP(X=k) = (1−p)^(k−1) · p
CDFP(X≤k) = 1 − (1−p)^k
Meanμ = 1/p
Varianceσ² = (1−p) / p²
🎯 When to Use
Use when counting the number of trials until the first success. Examples: number of attempts to make a free throw, number of calls until a sale, number of coin flips until heads.
💡 Example
Free throw success rate p=0.70. P(first success on 3rd attempt)?
→ P(X=3) = (0.30)² × 0.70 = 0.09 × 0.70 = 0.063
🔍 Interpretation
The geometric distribution has the memoryless property — the probability of success on the next trial is always p, regardless of how many failures have occurred. Mean = 1/p tells you the expected number of tries.
Distribution
📉

Exponential Distribution

Time between events: P(X≤x)=1-e^(-λx), memoryless property

20 / 39
📥 Exponential Distribution Parameters
📐 Formula
PDFf(x) = λ·e^(−λx) for x ≥ 0
CDFP(X ≤ x) = 1 − e^(−λx)
Meanμ = 1/λ
Medianln(2) / λ ≈ 0.693/λ
🎯 When to Use
Use to model time or distance between events in a Poisson process. Examples: time between customer arrivals, time until equipment failure, waiting time at a help desk.
💡 Example
Average 0.4 failures/hour (λ=0.4). P(next failure within 5 hours)?
→ P(X ≤ 5) = 1 − e^(−0.4×5) = 1 − e^(−2) ≈ 0.865
🔍 Interpretation
The exponential distribution is memoryless: the probability of waiting x more minutes is the same regardless of how long you've already waited. Mean = 1/λ. Median < Mean — it's right-skewed.
Distribution

Uniform Distribution

U(a,b) PDF, CDF, P(X≤x), mean=(a+b)/2, variance=(b-a)²/12

21 / 39
📥 Uniform Distribution Parameters
📐 Formula
PDFf(x) = 1/(b−a) for a ≤ x ≤ b
CDFP(X ≤ x) = (x−a)/(b−a)
Meanμ = (a+b) / 2
Varianceσ² = (b−a)² / 12
🎯 When to Use
Use when every value in a range is equally likely. Examples: random number generators, rounding errors, arrival time within a window, rolling a fair die (discrete uniform).
💡 Example
Bus arrives uniformly between 0 and 30 minutes. P(wait ≤ 10 min)?
→ P(X ≤ 10) = (10−0)/(30−0) = 1/3 ≈ 33.3%
🔍 Interpretation
The uniform distribution has constant probability density — no value is more likely than another. The variance depends only on the range width (b−a). It's often used as a 'no information' prior in Bayesian analysis.
Distribution
🃏

Hypergeometric Distribution

P(X=k) sampling WITHOUT replacement from finite population

22 / 39
📥 Hypergeometric Parameters
📐 Formula
PMFP(X=k) = C(K,k)·C(N−K,n−k) / C(N,n)
Meanμ = n·K/N
Varianceσ² = n·(K/N)·((N−K)/N)·((N−n)/(N−1))
🎯 When to Use
Use when sampling without replacement from a finite population. Key difference from Binomial: the population is finite and each draw changes the remaining probabilities. Common in quality control, card games, and auditing.
💡 Example
Lot of 30 items, 12 defective. Draw 8. P(exactly 3 defective)?
→ P(X=3) = C(12,3)·C(18,5) / C(30,8) ≈ 0.276
🔍 Interpretation
As N → ∞, the hypergeometric approaches the Binomial. The finite population correction factor ((N−n)/(N−1)) makes the variance smaller than Binomial — drawing from a small pool reduces uncertainty.
Hypothesis
🎯

Confidence Interval

CI for population mean: x̄ ± z*(σ/√n) at 90/95/99%

23 / 39
📥 Sample Statistics
📐 Formula
CI (z)x̄ ± z* · (σ/√n)
CI (t)x̄ ± t* · (s/√n) [σ unknown]
Width2 × z* × SE
z* values90%→1.645, 95%→1.960, 99%→2.576
🎯 When to Use
Use to estimate a population parameter from a sample. Instead of a single point estimate, a CI gives a range that accounts for sampling uncertainty. Used in polls, clinical trials, and scientific reporting.
💡 Example
Sample: x̄=145, s=22, n=50. 95% CI for μ?
→ SE = 22/√50 = 3.11 → 145 ± 1.96×3.11 = (138.9, 151.1)
🔍 Interpretation
A 95% CI means: if we repeated the sampling process 100 times, about 95 of the resulting intervals would contain the true μ. It does NOT mean 'there is a 95% chance μ is in this interval' — μ is fixed, the interval is random.
Hypothesis
🔭

Sample Size Calculator

Minimum n = (z·σ/E)² for desired margin of error & confidence

24 / 39
📥 Parameters
📐 Formula
Min nn = (z* · σ / E)²
E = MEMargin of error: E = z* · σ / √n
Halve MEQuadruple n (n ∝ 1/E²)
🎯 When to Use
Use before data collection to determine how many observations are needed to achieve a desired precision. Essential for surveys, clinical trials, and quality control sampling plans.
💡 Example
Want ME ≤ 3 points, σ=18, 95% confidence.
→ n = (1.96×18/3)² = (11.76)² = 138.3 → need n = 139
🔍 Interpretation
Sample size has diminishing returns — halving the margin of error requires 4× as many samples. More confidence also requires more samples (wider z*). If σ is unknown, use a pilot study or literature estimate.
Hypothesis
🧪

One-Sample Z-Test

H₀: μ = μ₀ when σ known — z-statistic, p-value, decision

25 / 39
📥 Z-Test Parameters
📐 Formula
Z statisticz = (x̄ − μ₀) / (σ/√n)
Two-tailedP = 2·P(Z > |z|)
Right-tailedP = P(Z > z)
Left-tailedP = P(Z < z)
🎯 When to Use
Use to test whether a sample mean differs from a known value when population σ is known. Examples: testing if a new production process changes the mean output, comparing a class average to a national standard.
💡 Example
Claim: μ=100. Sample: x̄=108, σ=15, n=40.
→ z = (108−100)/(15/√40) = 8/2.37 = 3.37 → p=0.0008 → Reject H₀
🔍 Interpretation
The p-value is the probability of observing a result at least as extreme as yours if H₀ were true. Small p (< α) means the result is unlikely under H₀ — reject it. Large p means insufficient evidence to reject.
Hypothesis
🔬

One-Sample T-Test

H₀: μ = μ₀ when σ unknown — t-statistic, df, p-value

26 / 39
📥 T-Test Data
📐 Formula
T statistict = (x̄ − μ₀) / (s/√n)
Degrees of Fdf = n − 1
Use Z whenn > 30 and σ known; else use t
🎯 When to Use
Use when testing a sample mean against a known value but σ is unknown (which is almost always). The t-distribution has heavier tails than z to account for estimating σ from the sample.
💡 Example
12 patients: average recovery 20.5 days, s=3.1. Is this different from 18 days?
→ t = (20.5−18)/(3.1/√12) = 2.5/0.895 = 2.79, df=11 → p≈0.018 → Reject H₀
🔍 Interpretation
As df increases (larger n), the t-distribution approaches normal. For small samples (n<30), t-critical values are larger — it's harder to reject H₀, which is appropriate given greater uncertainty about σ.
Hypothesis
⚖️

Two-Sample T-Test

Compare two independent group means using Welch's method

27 / 39
📥 Two Group Statistics
Group 1
Group 2
📐 Formula
T statt = (x̄₁ − x̄₂) / √(s₁²/n₁ + s₂²/n₂)
Welch dfdf = (s₁²/n₁ + s₂²/n₂)² / [...]
H₀μ₁ = μ₂
🎯 When to Use
Use to compare means of two independent groups when σ₁ and σ₂ are unknown. Welch's version does not assume equal variances (more robust than Student's t). Common in A/B testing, clinical trials, and experimental comparisons.
💡 Example
Group A: x̄=78, s=11, n=35. Group B: x̄=71, s=13, n=40.
→ t=2.73, df≈70, p≈0.008 → Significant difference
🔍 Interpretation
A significant result means the groups' means are unlikely to be equal in the population. Look at the confidence interval for (μ₁−μ₂) to understand the practical size of the difference, not just statistical significance.
Hypothesis
🧬

Paired T-Test

Before-after or matched pairs — mean difference d̄, t-stat

28 / 39
📥 Paired Observations
📐 Formula
Differencesd = X₁ − X₂ (for each pair)
T statistict = d̄ / (sd/√n)
dfdf = n − 1 (n = number of pairs)
🎯 When to Use
Use when measurements are paired or matched: before/after treatments, same subject measured twice, matched case-control studies. Pairing removes individual variation and increases statistical power.
💡 Example
Before/after training: 10 athletes measured twice.
→ Mean difference d̄=−6.4 (improved), sd=2.8 → t=−7.23, p<0.001 → Significant improvement
🔍 Interpretation
The paired test is more powerful than the two-sample test for paired data because it removes between-subject variability. The sign of d̄ tells direction: negative d̄ = X₁ < X₂ (X₂ is higher).
Hypothesis
χ²

Chi-Square Test

Goodness of fit: χ² = Σ(O-E)²/E with critical value & decision

29 / 39
📥 Observed & Expected Frequencies
📐 Formula
Chi-Squareχ² = Σ (O − E)² / E
dfdf = k − 1 (k = categories)
ExpectedE = n / k (if equally distributed)
🎯 When to Use
Use to test whether observed frequencies match expected frequencies. Examples: testing if a die is fair, checking if survey responses fit a theoretical distribution, testing seasonal patterns.
💡 Example
Roll a die 120 times: O=[15,25,20,18,22,20]. Expected: 20 each.
→ χ²=4.3, df=5, χ²_crit=11.07 → Fail to reject H₀ → Die appears fair
🔍 Interpretation
Large χ² = observed is far from expected = evidence against H₀. Small χ² = good fit. Requires E ≥ 5 for each category (otherwise combine categories). More categories = more df = higher critical value.
Hypothesis
📊

Contingency Table

2×2 Chi-square independence test + odds ratio + relative risk

30 / 39
📥 2×2 Observed Frequencies
B (+)
B (−)
A (+)
A (−)
📐 Formula
Chi-Squareχ² = Σ (O−E)²/E df=1
ExpectedEᵢⱼ = (Row total × Col total) / N
Odds RatioOR = (a·d) / (b·c)
Relative RiskRR = P(B+|A+) / P(B+|A−)
🎯 When to Use
Use to test association between two binary variables. Examples: treatment vs outcome, gender vs preference, exposure vs disease. Essential in epidemiology and clinical research.
💡 Example
Treated: 55 recovered, 15 not. Control: 20 recovered, 60 not.
→ χ²=42.7, OR=11.0, RR=2.75 → Strong association
🔍 Interpretation
OR=1 means no association. OR>1 means the exposure increases the odds of the outcome. RR>1 means the exposure increases the risk. Chi-square only tests existence — OR and RR measure the strength of association.
Hypothesis
%

Proportion Z-Test

H₀: p = p₀ — one-sample proportion test with 95% CI

31 / 39
📥 Proportion Z-Test Parameters
📐 Formula
Z statisticz = (p̂ − p₀) / √(p₀(1−p₀)/n)
SESE = √(p₀·q₀/n)
95% CIp̂ ± 1.96·√(p̂·(1−p̂)/n)
🎯 When to Use
Use to test whether a sample proportion differs from a hypothesized value. Examples: testing if a coin is fair, checking if a campaign achieved target click rate, quality control pass rate.
💡 Example
62 of 120 voters support candidate. Is this more than 45%?
→ p̂=0.517, z=1.58, p=0.057 → Fail to reject H₀ at α=0.05
🔍 Interpretation
Need n·p₀ ≥ 10 and n·(1−p₀) ≥ 10 for the normal approximation to be valid. The CI shows the range of plausible true proportions. Larger n → narrower CI → more precise estimate.
Hypothesis
⚗️

Two-Proportion Z-Test

H₀: p₁ = p₂ — compare two independent proportions

32 / 39
📥 Two Proportion Parameters
Sample 1
Sample 2
📐 Formula
Pooled p̂p̂ = (x₁+x₂) / (n₁+n₂)
Z statisticz = (p̂₁−p̂₂) / √[p̂(1−p̂)(1/n₁+1/n₂)]
H₀p₁ = p₂
🎯 When to Use
Use to test whether two independent proportions are equal. Examples: comparing click rates of two ad versions, recovery rates in two treatment groups, pass rates for two schools.
💡 Example
Group 1: 78/150 (52%). Group 2: 65/130 (50%).
→ Pooled p̂=0.51, z=0.36, p=0.72 → No significant difference
🔍 Interpretation
The pooled proportion is used under H₀ (assuming they're equal). Use unpooled SE for confidence intervals. A significant result means the proportions likely differ in the population — but check if the difference is practically meaningful.
Hypothesis
🏗️

One-Way ANOVA

F-test for equality of 3+ group means, SS/MS/F table

33 / 39
📥 Group Data
Enter each group on a separate line as comma-separated values.
📐 Formula
SSBΣ nᵢ(x̄ᵢ − x̄)² df = k−1
SSWΣΣ (xᵢⱼ − x̄ᵢ)² df = N−k
FF = MSB / MSW = (SSB/dfB) / (SSW/dfW)
🎯 When to Use
Use to compare means of 3 or more independent groups. Running multiple t-tests inflates Type I error — ANOVA controls for this. Examples: comparing 3 diets, 4 teaching methods, 5 product versions.
💡 Example
4 groups of students taught by different methods: scores 75,82,79,88.
→ F=4.8, p=0.004 → At least one mean differs significantly
🔍 Interpretation
ANOVA only tells you whether at least one mean differs, not which ones. Follow with post-hoc tests (Tukey, Bonferroni) for pairwise comparisons. F = ratio of between-group to within-group variance — large F means group means are far apart relative to within-group spread.
Regression
📉

Linear Regression

ŷ = a + bx — slope, intercept, R², predictions, Σxy calculations

34 / 39
📥 Data Pairs
📐 Formula
Lineŷ = a + bx
Slopeb = [nΣxy − ΣxΣy] / [nΣx² − (Σx)²]
Intercepta = ȳ − b·x̄
R² = SSR/SST = 1 − SSE/SST
🎯 When to Use
Use to model the linear relationship between two quantitative variables and make predictions. Requires approximately linear relationship, homoscedasticity, and independent observations.
💡 Example
Hours studied (x) vs Exam score (y): slope b=2.5, a=50
→ ŷ = 50 + 2.5x → studying 8 hours predicts score of 70
🔍 Interpretation
R² = proportion of variance in y explained by x. R²=0.85 means x explains 85% of the variation in y. Check residual plots for patterns — patterns suggest the linear model is not appropriate. Correlation ≠ causation.
Regression
🔗

Pearson Correlation

r, R², significance test for linear relationship strength

35 / 39
📥 Data Pairs
📐 Formula
rr = [nΣxy − ΣxΣy] / √[(nΣx²−(Σx)²)(nΣy²−(Σy)²)]
Range−1 ≤ r ≤ +1
R² = r² (coefficient of determination)
🎯 When to Use
Use to measure the strength and direction of linear relationship between two continuous variables. Both variables should be approximately normally distributed for inference.
💡 Example
Study hours vs exam score for 7 students: r=0.97
→ Very strong positive correlation. R²=0.94 → 94% of variance explained.
🔍 Interpretation
|r| near 1 = strong linear relationship. |r| near 0 = weak or no linear relationship. r only measures LINEAR association — a strong curved relationship may give r≈0. Always plot a scatter diagram first!
Regression
🏅

Spearman Correlation

Non-parametric rank correlation rₛ — ordinal or skewed data

36 / 39
📥 Data Pairs (ranks or raw values)
📐 Formula
rₛrₛ = 1 − 6Σd² / [n(n²−1)]
dd = Rank(x) − Rank(y) for each pair
Range−1 ≤ rₛ ≤ +1
🎯 When to Use
Use when data is ordinal, skewed, or has outliers. Spearman measures monotonic relationship (not just linear). Use instead of Pearson when normality can't be assumed.
💡 Example
Rankings by two judges for 9 performances: rₛ=0.85
→ Strong agreement between judges.
🔍 Interpretation
rₛ=+1 means perfect rank agreement. rₛ=−1 means perfect rank reversal. Less sensitive to outliers than Pearson. Use when you care about direction of relationship, not exact distances between values.
Regression
🔀

Covariance

Cov(X,Y) = Σ(x-x̄)(y-ȳ)/(n-1) with full deviation table

37 / 39
📥 Data Pairs
📐 Formula
Sample CovCov(X,Y) = Σ(xᵢ−x̄)(yᵢ−ȳ) / (n−1)
Populationσxy = Σ(xᵢ−μx)(yᵢ−μy) / n
Relationr = Cov(X,Y) / (σx·σy)
🎯 When to Use
Use to measure how two variables change together. Foundation for correlation, regression, and portfolio theory in finance. Covariance is in the original units (squared), making it hard to interpret alone.
💡 Example
Hours studied (x) & exam score (y): Cov=+45
→ Positive: more study → higher score. Convert to r: r=45/(3.5×14)=0.92
🔍 Interpretation
Cov > 0: variables tend to increase together. Cov < 0: one increases as other decreases. Cov ≈ 0: no linear trend. The magnitude is hard to interpret alone — divide by σx·σy to get r which is always in [−1,+1].
Advanced
〽️

Skewness & Kurtosis

Distribution shape: Fisher's g₁ skewness, g₂ excess kurtosis

38 / 39
📥 Input Data
📐 Formula
Skewness g₁g₁ = [n/((n−1)(n−2))] × Σ[(xᵢ−x̄)/s]³
Kurtosis g₂g₂ = [n(n+1)/((n−1)(n−2)(n−3))] × Σ[(xᵢ−x̄)/s]⁴ − 3(n−1)²/((n−2)(n−3))
🎯 When to Use
Use to assess the shape of a distribution. Important before choosing statistical tests — many tests assume normality. Skewness and kurtosis are used in normality testing.
💡 Example
Income data: g₁=1.8 (right-skewed), g₂=4.2 (leptokurtic)
→ Many low incomes with a few very high earners. Heavy tail.
🔍 Interpretation
g₁=0: symmetric | g₁>0: right tail (median < mean) | g₁<0: left tail (median > mean). g₂=0: normal tails | g₂>0: heavier tails than normal | g₂<0: lighter tails. |g₁|>1 or |g₂|>3 suggests significant non-normality.
Advanced
📐

Standard Error

SE=s/√n, margin of error, CI width from data or summary stats

39 / 39
📥 Data or Summary Statistics
📐 Formula
SE of MeanSE = s / √n
Margin of EME = z* × SE
CI Width2 × z* × SE
Reduce SEQuadruple n to halve SE
🎯 When to Use
Use to measure the precision of a sample mean as an estimate of the population mean. SE is the standard deviation of the sampling distribution of x̄ — it decreases as n increases.
💡 Example
15 measurements: s=3.2, n=15. SE=3.2/√15=0.826
→ 95% CI: x̄ ± 1.96×0.826 = x̄ ± 1.62
🔍 Interpretation
SE quantifies sampling uncertainty. Smaller SE → estimates are more precise → narrower CIs. SE depends on sample size (controllable) and population variability (uncontrollable). Doubling n reduces SE by 1/√2 ≈ 29%.
🤖 Explain As:
Advanced
⚖️

Mann-Whitney U Test

Non-parametric test comparing two independent groups — no normality assumption

40 / 41
📥 Input Data
Group 1
Group 2
📐 Formula
U₁U₁ = n₁·n₂ + n₁(n₁+1)/2 − R₁
U₂U₂ = n₁·n₂ − U₁
U (test stat)U = min(U₁, U₂)
z approxz = (U − μᵤ) / σᵤ
🎯 When to Use
Use instead of the two-sample t-test when: data is not normally distributed, ordinal scale, contains outliers, or sample sizes are small. It compares medians and rank distributions — making it robust and assumption-free.
💡 Example
A/B test on revenue per user where revenue is heavily skewed (many small, few huge). t-test would be unreliable — use Mann-Whitney U instead.
🔍 Interpretation
p < 0.05 → Groups differ significantly. The rank-biserial correlation r = 1 − 2U/(n₁·n₂) gives effect size: |r| ≥ 0.1 small, ≥ 0.3 medium, ≥ 0.5 large. A large U₁ means Group 1 tends to have higher values.
Advanced
🔔

Normality Test

Jarque-Bera test for normality using skewness and excess kurtosis

41 / 41
📥 Input Data
📐 Formula
JB StatisticJB = n/6 × [g₁² + (g₂²/4)]
Skewness g₁g₁ = [n/((n-1)(n-2))] × Σ[(xᵢ−x̄)/s]³
Excess Kurt. g₂g₂ = adjusted 4th moment − 3
DistributionJB ~ χ²(2) under H₀: normality
🎯 When to Use
Use before running t-tests, ANOVA, regression, or other parametric tests that assume normality. The Jarque-Bera test is most reliable with n ≥ 20. For n < 20, also inspect histograms and Q-Q plots visually.
💡 Example
Before running a t-test on salary data, you test normality. JB = 18.4, p < 0.01 → reject normality → use Mann-Whitney U test instead.
🔍 Interpretation
H₀: Data is normally distributed. p < 0.05 → reject normality → consider non-parametric tests or data transformation (log, sqrt). p ≥ 0.05 → normality not rejected → proceed with parametric tests.
Hypothesis
🏆

Kruskal-Wallis Test

Non-parametric alternative to one-way ANOVA for 3+ independent groups

42 / 50
📥 Group Data
📐 Formula
H statisticH = [12/N(N+1)] × Σ(Rᵢ²/nᵢ) − 3(N+1)
RᵢSum of ranks in group i
DecisionH > χ²(α, k−1) → Reject H₀
🎯 When to Use
Use when comparing 3 or more independent groups and normality cannot be assumed. It ranks all values together and tests whether ranks distribute equally. Ideal for ordinal data or small samples.
💡 Example
Compare pain scores across 3 treatments (no normality). H = 8.6, p = 0.013 → reject H₀ → at least one treatment differs significantly.
🔍 Interpretation
p < 0.05 → at least one group differs. Follow up with pairwise Mann-Whitney U tests with Bonferroni correction to identify which pairs differ.
Hypothesis
🎣

Fisher's Exact Test

Exact p-value for 2×2 contingency tables — ideal for small samples

43 / 50
📥 2×2 Table
📐 Formula
p-valuep = (R₁!R₂!C₁!C₂!) / (N! × a!b!c!d!)
Odds RatioOR = (a×d) / (b×c)
🎯 When to Use
Use for 2×2 tables where any expected cell count < 5. Unlike chi-square, Fisher's calculates the exact probability — no approximation needed.
💡 Example
Drug trial: 8 recovered/2 not (treatment) vs 1 recovered/9 not (control). p = 0.003 → strong evidence drug works.
🔍 Interpretation
p < 0.05 → association is statistically significant. OR > 1 → row 1 has higher odds. OR = 1 → no association.
Distribution
📉

T-Distribution Calculator

Critical t-values, p-values and percentiles for any degrees of freedom

44 / 50
📥 Parameters
📐 Formula
PDFf(t) ∝ (1 + t²/ν)^(−(ν+1)/2)
Critical tt* such that P(|T|>t*) = α
🎯 When to Use
Use the t-distribution when σ is unknown and n is small. As df → ∞ it approaches the standard normal. Essential for t-tests and confidence intervals.
💡 Example
df=15, α=0.05, two-tailed → critical t = ±2.131. If your test statistic |t| > 2.131, reject H₀.
Distribution
📊

F-Distribution Calculator

Critical F-values and p-values for ANOVA and variance ratio tests

45 / 50
📥 Parameters
📐 Formula
F ratioF = (χ²₁/df₁) / (χ²₂/df₂)
ANOVAF = MS_between / MS_within
🎯 When to Use
Use to find critical F-values for ANOVA tests and compare two variances. The F-distribution is always right-skewed and non-negative.
💡 Example
ANOVA with df₁=3, df₂=20, α=0.05 → F_critical = 3.098. If your F-statistic > 3.098, reject H₀.
Advanced
💥

Effect Size Calculator

Cohen's d and Hedges' g — measure practical significance beyond p-values

46 / 50
📥 Group Statistics
📐 Formula
Cohen's dd = (x̄₁ − x̄₂) / s_pooled
Pooled SDs_p = √[(s₁²+s₂²)/2]
Hedges' gg = d × (1 − 3/(4N−9))
🎯 When to Use
Use after a significant t-test to measure practical importance. Small: d=0.2, Medium: d=0.5, Large: d=0.8. A result can be statistically significant but practically meaningless.
💡 Example
Teaching method A: mean=85, SD=10, n=30. Method B: mean=75, SD=12, n=30. d = 0.91 → large effect → method A is substantially better.
Descriptive
📐

Variance Calculator

Population & sample variance with complete sum-of-squares working

47 / 50
📥 Input Data
📐 Formula
Sample s²s² = Σ(xᵢ − x̄)² / (n−1)
Population σ²σ² = Σ(xᵢ − μ)² / n
Std Devs = √s²
🎯 When to Use
Variance measures how far data points spread from the mean. Use sample variance (÷n−1) for data sampled from a population. Use population variance (÷n) when you have complete population data.
💡 Example
Data: 4, 8, 15, 16, 23, 42. Mean=18. SS = (4−18)²+...= 1007.33. Sample variance = 1007.33/5 = 201.47. SD = 14.19.
Descriptive
📦

IQR Calculator

Q1, Q2, Q3, IQR and box-plot fence boundaries with step-by-step solution

48 / 50
📥 Input Data
📐 Formula
IQRIQR = Q3 − Q1
Lower FenceQ1 − 1.5 × IQR
Upper FenceQ3 + 1.5 × IQR
🎯 When to Use
IQR is the spread of the middle 50% of data. It is resistant to outliers, making it better than range for skewed data. Essential for box plots and outlier detection.
💡 Example
Data: 2,5,7,8,10,13,14,19,23. Q1=6, Q3=16.5, IQR=10.5. Fences: [−9.75, 32.25]. No outliers detected.
Distribution
🎯

Central Limit Theorem Calculator

Sampling distribution of x̄ — mean, SE, and P(x̄ in range)

49 / 50
📥 Population Parameters
📐 Formula
Mean of x̄μ_x̄ = μ
Std ErrorSE = σ / √n
Z-scorez = (x̄ − μ) / SE
🎯 When to Use
The CLT states that for n ≥ 30, the sampling distribution of x̄ is approximately normal regardless of population shape. Use to find probabilities about sample means.
💡 Example
μ=100, σ=15, n=36 → SE=2.5. P(x̄ ≤ 105) = P(z ≤ 2.0) = 0.9772. There is a 97.72% chance the sample mean is below 105.
Distribution
📈

Log-Normal Distribution

PDF, CDF, percentiles and probabilities for log-normal random variables

50 / 50
📥 Parameters
📐 Formula
PDFf(x) = exp(−(lnx−μ)²/2σ²) / (xσ√2π)
Meane^(μ + σ²/2)
Mediane^μ
🎯 When to Use
Use when data is always positive and right-skewed — stock prices, income, survival times, bacteria counts. If ln(X) is normal, then X is log-normal.
💡 Example
μ=0, σ=1: Mean=1.649, Median=1.000. P(X ≤ 2.5) ≈ 0.756. The distribution is right-skewed with a long upper tail.