Skip to main content
Category: Fallacies
Type: Logical Fallacy
Origin: Statistical concept formalized in 20th century
Also known as: Correlation-Causation Fallacy, False Cause, Spurious Correlation
Quick Answer — The Correlation-Causation Fallacy occurs when people assume that because two variables are statistically correlated (they move together), one must necessarily cause the other. This is one of the most important concepts in statistical literacy: correlation simply indicates that two things change together—it tells us nothing about WHY they change together. The relationship could be reversed causation, both variables could be caused by a third factor, or the correlation could be purely coincidental.

What is the Correlation-Causation Fallacy?

The phrase “correlation does not imply causation” is a fundamental principle in statistics and scientific reasoning. When two variables show a statistical relationship—meaning they tend to change together in predictable ways—it’s tempting to conclude that one causes the other. However, this leap is not justified by the data alone.
“Two things moving together tells us about their relationship, not their causation. The discovery of correlation is the beginning of investigation, not its conclusion.”
The key insight is that correlation only tells us that a relationship EXISTS; it doesn’t tell us what CAUSES that relationship. A strong correlation might reflect reverse causation (B causes A), common-cause (a third variable causes both), or no direct relationship at all (coincidence or sampling error).

Correlation-Causation in 3 Depths

  • Beginner: Ice cream sales and swimming pool drownings are both higher in summer. Does ice cream cause drowning? Obviously not—both are caused by hot weather. This spurious correlation illustrates why correlation alone cannot establish causation.
  • Practitioner: In business analytics, revenue and website traffic might be correlated—but does more traffic cause more revenue? Possibly, but it could also be that successful marketing campaigns cause both, or that luxury products cause high revenue and attract wealthy customers who also browse more. Causal claims require more than correlation.
  • Advanced: In epidemiology, finding that people who exercise more tend to live longer doesn’t prove exercise extends life. Healthier people might exercise more, or socioeconomic factors might cause both exercise and longevity. Randomized controlled trials are needed to establish causation.

Origin

The explicit distinction between correlation and causation became a cornerstone of modern statistics in the early 20th century. Statisticians like Karl Pearson and later Ronald Fisher developed the mathematical tools to measure correlation while explicitly warning against causal interpretations. The phrase “correlation does not imply causation” became particularly prominent in the 1940s-1950s as statistical methods spread across sciences. Today, it’s a fundamental concept in fields ranging from epidemiology and economics to machine learning and A/B testing. Despite this, the fallacy remains one of the most common errors in interpreting data—in news articles, business reports, and everyday reasoning.

Key Points

1

Correlation Is Descriptive, Not Causal

Correlation describes a relationship between variables—it tells us they move together. But description is not explanation. The “why” requires additional investigation beyond statistical association.
2

Three Alternative Explanations

When A and B correlate, at least three possibilities exist: A causes B, B causes A, or a third variable C causes both. All three produce the same correlation pattern.
3

Coincidence Exists

With enough data, spurious correlations inevitably appear. The internet is full of ridiculous correlations (like between per capita cheese consumption and number of people who die by becoming tangled in their bedsheets)—pure coincidence.
4

Causation Requires Mechanism

Establishing true causation requires demonstrating a causal mechanism—not just observing that variables move together. This typically requires controlled experiments or detailed theoretical models.

Applications

Data Science & Analytics

Data scientists must constantly resist the temptation to infer causation from correlation. A/B testing, controlled experiments, and causal inference methods are specifically designed to move beyond mere correlation.

Public Health

Observational studies often show correlations between behaviors and health outcomes. But without controlled trials, we can’t know if the behavior causes the outcome or if confounding factors explain both.

Economics & Policy

Economic policies are often justified by correlations: “Countries with property taxes have higher GDP.” But such correlations rarely establish that the policy causes economic growth; both might reflect other factors.

Everyday Decision-Making

In daily life, we constantly confuse correlation with causation: “I took this supplement and felt better, so it must work.” Without controlling for other factors, we can’t know if the supplement helped or if we’d have improved anyway.

Case Study

The relationship between education and income provides a classic example of correlation-causation complexity. Decades of data show that people with more education tend to earn higher incomes. It’s tempting to conclude: “Education causes higher income, therefore we should encourage everyone to get more education.” But this correlation could reflect multiple causal stories. Perhaps smarter people both pursue more education AND earn more (ability bias). Perhaps prestigious colleges both select high-achieving students AND provide better job networks (selection bias). Perhaps certain personality traits cause both educational attainment and career success (omitted variable bias). The most rigorous studies try to isolate causation by finding natural experiments—situations where education varied for reasons unrelated to ability. These studies often show smaller returns to education than naive correlations suggest. The lesson: even a relationship that’s held for decades might not be causal, and policy based on naive correlation can be seriously misguided.

Boundaries and Failure Modes

When Correlation Suggests Causation: In some controlled contexts—like randomized experiments where only one variable differs between groups—correlation does provide causal evidence. The key is knowing WHEN the methodological conditions for causal inference are met. When Correlation Is Most Dangerous: Correlation is most dangerous in complex observational systems—economics, social science, health—where many variables interact and unobserved confounding is likely. Here, correlation is almost never sufficient for causal conclusions. Common Misuse Pattern: The media frequently reports correlations as if they were causal: “Studies show people who drink coffee have lower heart disease.” Without noting that this could reflect that healthier people drink coffee, or that both reflect lifestyle factors, such reporting spreads causal misinformation.

Common Misconceptions

Reality: No matter how strong the correlation, causation cannot be inferred without additional evidence. Strong correlations can arise from any of the alternative explanations—reverse causation, third variables, or coincidence.
Reality: With enough data, even very unlikely patterns appear. With millions of data points, finding some spurious correlations is mathematically guaranteed—which is why we can’t rely on correlation alone.
Reality: Statistical controls can help but can’t fully solve confounding. We can only control for variables we can measure—and unmeasured confounding remains a persistent problem in observational studies.

Post Hoc Ergo Propter Hoc

The classic temporal version of assuming causation from sequence. If B followed A, A must have caused B—ignoring other causal possibilities.

Confounding Variable

A hidden third variable that causes both the apparent cause and effect, creating a spurious correlation. Understanding confounders is key to proper causal analysis.

Spurious Correlation

A specific type of correlation where the relationship is accidental—neither variable causes the other, and no third variable connects them. Pure statistical noise.

One-Line Takeaway

When you see a correlation, ask: Could this be reversed? Could a third factor cause both? Could this just be coincidence? Correlation is a starting point for investigation, not a conclusion.