One of the most persistent ideas I encountered while studying statistics, machine learning, and data science was a phrase repeated almost everywhere:
Correlation is not causation.
At first, it seemed straightforward enough. We can measure correlations. We can build predictive models. We can identify patterns hidden within large datasets.
But a question lingered in the back of my mind:
If correlation is not causation, then how do we identify causation?
The deeper I looked, the larger the question became.
What began as a simple curiosity eventually led me into a landscape spanning statistics, philosophy, information theory, complex systems, economics, artificial intelligence, neuroscience, and even the foundations of scientific reasoning itself.
My initial search started where many modern intellectual journeys begin: Google. Searching for terms such as causality and causation quickly revealed something unexpected. What I had assumed was a relatively narrow topic turned out to be an enormous field with centuries of philosophical debate and entire branches of modern scientific research devoted to it.
Soon my reading list began to grow. Works on the philosophy of causation appeared alongside books on scientific methodology, economics, complex systems, and causal inference. Names such as David Hume and Judea Pearl surfaced repeatedly. Questions that initially seemed simple became increasingly difficult:
- What does it mean for one thing to cause another?
- Can causation be observed directly?
- Is prediction the same thing as explanation?
- Can information flow reveal causal influence?
Eventually I came across a book that immediately captured my attention:
An Introduction to Transfer Entropy: Information Flow in Complex Systems by Terry Bossomaier, Lionel Barnett, Michael Harré, and Joseph T. Lizier.

At first glance, the title seemed highly technical. Yet it appeared to sit precisely at the intersection of several interests of mine: statistics, machine learning, complex systems, and causality.
Curious but uncertain about how transfer entropy related to causation, I uploaded sections of the book to ChatGPT and began a lengthy discussion. What followed was not merely a summary of the text. Rather, it became an extended exploration of information theory, entropy, Granger causality, Judea Pearl’s causal framework, factor analysis, attribution analysis, and the broader question of how scientists attempt to infer causal relationships from observational data.
That conversation ultimately helped me connect many concepts that I had previously understood only in isolation. More importantly, it revealed that causality itself exists at multiple levels—from simple correlation, to predictive influence, to intervention-based causal reasoning. The journey transformed a familiar machine-learning caveat into a much deeper investigation of how we understand reality itself.
The Assumption I Started With
When most people first encounter causation, the intuitive approach is often something like:
- Observe a relationship.
- Gather more evidence.
- Conduct qualitative studies.
- Determine whether one thing causes another.
When I initially searched online about causality, many resources emphasized experimental design, observational studies, and qualitative reasoning.
Those are certainly important.
But I quickly discovered that modern research on causality extends far beyond traditional experimental methods.
There are entire fields devoted to answering questions such as:
- What influences what?
- How does information flow through a system?
- Can we infer directionality from observations?
- How do hidden factors affect outcomes?
- When does prediction imply causation?
- Can causation itself be measured?
The deeper I searched, the more references appeared.
David Hume.
Judea Pearl.
Granger causality.
Structural causal models.
Information theory.
Transfer entropy.
Complex systems.
What I initially thought was a relatively narrow topic turned out to be an intellectual universe.
A Philosophical Surprise: Hume’s Challenge
One of the most fascinating discoveries was that causation itself is not as obvious as it seems.
The eighteenth-century philosopher David Hume argued something profoundly unsettling:
We never directly observe causation.
What we actually observe are events occurring together repeatedly.
For example:
- The cue ball strikes another ball.
- The second ball moves.
We say the first caused the second.
But according to Hume, all we truly observe is a consistent sequence of events.
The “causal force” itself is never directly visible.
This realization influenced centuries of philosophy and eventually shaped modern statistics and scientific methodology.
In a sense, the problem of causation has been with us for hundreds of years.
Information Theory: A Different Way of Thinking
The conversation that followed my exploration of Transfer Entropy introduced me to another profound conceptual shift.
I had always associated information with:
- meaning,
- knowledge,
- wisdom,
- understanding.
But information theory approaches information differently.
In Claude Shannon’s framework:
Information is the reduction of uncertainty.
This sounds simple, but it fundamentally changes how we think.
Imagine being told that tomorrow’s temperature in Chicago could be:
- 20°F
- 50°F
- 80°F
Before hearing the forecast, uncertainty exists.
After hearing:
“Tomorrow will be 80°F.”
uncertainty collapses.
The message contains information because it reduced uncertainty.
An important consequence follows:
- Rare events carry more information.
- Expected events carry less information.
A report that “the sun will rise tomorrow” contains little information because most people already expect it.
By contrast, “snow in Miami tomorrow” would dramatically alter expectations and therefore contain far more information.
Perhaps the most surprising lesson was this:
Information is not the same thing as meaning.
A completely random sequence of symbols may contain enormous information in Shannon’s sense while carrying no meaningful message whatsoever.
Information theory measures unpredictability, not significance.
Entropy: Measuring Uncertainty
This naturally leads to the concept of entropy.
Entropy measures the average uncertainty of a system.
A fair coin has high entropy because either outcome is equally likely.
A heavily biased coin has low entropy because its outcome is largely predictable.
This idea may sound abstract, but entropy has become foundational to:
- statistics,
- machine learning,
- artificial intelligence,
- cryptography,
- compression algorithms,
- neuroscience,
- and complex systems research.
Eventually, it also became the foundation of Transfer Entropy.
Why Correlation Was Not Enough
As I continued exploring, I realized that many traditional statistical tools answer only part of the puzzle.
Suppose:
- Stock A moves with Stock B.
- Neuron A fires whenever Neuron B fires.
- Two economic indicators rise together.
Correlation can tell us:
These variables move together.
But correlation cannot tell us:
- Which influences which.
- Whether a hidden factor drives both.
- Whether timing matters.
- Whether the relationship has direction.
The reason is simple:
Correlation is symmetric.
If A correlates with B, then B correlates with A.
Direction disappears.
Yet causation fundamentally involves direction.
Granger Causality: A Major Step Forward
The next major idea I encountered was Granger Causality.
Clive Granger proposed a remarkably elegant concept:
If knowing the past of Y improves prediction of X beyond what X’s own history can provide, then Y “Granger-causes” X.
This was revolutionary because it introduced directionality into statistical analysis.
Instead of merely asking:
Are these variables associated?
we can ask:
Does the past of one variable improve prediction of another?
However, Granger causality works best under assumptions that many real-world systems violate:
- linear relationships,
- Gaussian distributions,
- relatively simple structures.
Unfortunately, reality is often far messier.
Enter Complex Systems
This was the point where the title of the book suddenly made sense.
The authors were not primarily interested in simple systems.
They were interested in:
- brains,
- ecosystems,
- financial markets,
- economies,
- social networks,
- adaptive systems,
- emergent systems.
These are examples of what researchers call complex systems.
A complex system is not merely complicated.
It is a system whose behavior emerges from many interacting components, often exhibiting feedback loops, nonlinear dynamics, adaptation, and unexpected collective behavior.
In such environments:
- causes may interact,
- effects may propagate,
- relationships may change over time.
Simple linear models often struggle to capture these realities.
Transfer Entropy: Measuring Information Flow
This is where Transfer Entropy enters the picture.
The central question becomes:
Does knowing the history of Y reduce uncertainty about the future of X?
If the answer is yes, information is flowing from Y toward X.
Transfer Entropy attempts to quantify:
Directed uncertainty reduction over time.
Or more intuitively:
How much additional predictive information does Y provide about X’s future?
Unlike simple correlation, Transfer Entropy captures directionality.
Unlike many classical approaches, it can handle nonlinear systems.
That is why it has attracted researchers in:
- neuroscience,
- physics,
- economics,
- complexity science,
- network theory.
The Important Limitation
One lesson that repeatedly emerged during my discussion was this:
Transfer Entropy does not prove causation.
This distinction is critical.
Transfer Entropy measures:
- predictive influence,
- informational dependence,
- directional information flow.
It does not establish metaphysical or scientific causality with certainty.
This led me to appreciate that there are actually different levels of causality.
Level 1: Correlation
Variables move together.
Level 2: Predictive Causality
Past values improve prediction.
Examples:
- Granger causality
- Transfer Entropy
Level 3: Structural Causality
What happens if we intervene?
If we forcibly change X, does Y change?
This is the territory of Judea Pearl and modern causal inference.
Judea Pearl and the Intervention Revolution
Another major branch of causality research comes from Judea Pearl.
Pearl’s framework asks a fundamentally different question.
Transfer Entropy asks:
Does Y help predict X?
Pearl asks:
If I intervene on Y, does X change?
This distinction may appear subtle, but it is profound.
Prediction and causation are related.
They are not identical.
Pearl’s work introduced:
- causal graphs,
- directed acyclic graphs (DAGs),
- interventions,
- do-calculus.
These ideas have become foundational in:
- economics,
- epidemiology,
- policy analysis,
- explainable AI,
- modern causal inference.
An Unexpected Detour: Factor Analysis and Attribution
Our conversation also clarified another confusion I initially had.
I wondered whether factor analysis belonged to attribution analysis.
The answer was nuanced.
Factor analysis seeks hidden factors that explain observed patterns.
For example:
Student performance across multiple subjects might be explained by an underlying latent factor such as quantitative reasoning.
Its goal is:
- structure discovery,
- dimensionality reduction,
- latent modeling.
Attribution analysis is different.
It asks:
How much did each factor contribute to an outcome?
Examples include:
- performance attribution,
- risk attribution,
- SHAP values,
- contribution decomposition.
Both deal with explanation.
But they answer different questions.
The Hierarchy That Finally Made Sense
One of the most valuable insights from this exploration was realizing that many statistical methods belong to different layers of inquiry.
Statistical Structure Discovery
- PCA
- Factor Analysis
- Clustering
Question:
What hidden patterns exist?
Dependency Analysis
- Correlation
- Mutual Information
- Transfer Entropy
Question:
How are variables related?
Causal Analysis
- Granger Causality
- Structural Causal Models
- Pearl’s Framework
Question:
What influences what?
Attribution Analysis
- SHAP
- Risk Attribution
- Performance Attribution
Question:
What contributed to the outcome?
Suddenly, the landscape became much clearer.
These methods are connected.
But they are not interchangeable.
Each addresses a different version of the question:
“What explains what?”
Why This Matters to Me
Looking back, I realize this intellectual journey was surprisingly consistent with my own background.
As someone who has worked in:
- accounting,
- auditing,
- fraud analytics,
- data science,
- machine learning,
I have often encountered the distinction between association and explanation.
A fraud model may discover that overseas transactions correlate with fraud.
But are they causal?
Or merely associated?
Or influenced by hidden confounding factors?
That distinction matters enormously.
In auditing, risk management, economics, and AI, prediction is valuable.
But understanding why something happens is often even more valuable.
Final Reflections
What began as a simple question about the familiar phrase “correlation is not causation” ultimately led me into a much broader exploration.
I discovered that causality is not a single concept.
It exists in layers.
From correlation to predictive influence, from information flow to intervention-based reasoning, each framework offers a different lens through which we attempt to understand reality.
Perhaps the most unexpected lesson was not mathematical at all.
The deeper I explored causality, the more I realized how difficult it is to answer a question that initially sounds so simple:
Why did this happen?
And perhaps that is exactly why the subject is so fascinating.
What started as a caveat in a machine learning textbook turned into an exploration of information, uncertainty, prediction, complex systems, and the very foundations of scientific reasoning itself.
The rabbit hole, it turns out, goes much deeper than correlation.

Leave a Reply