It took reading Judea Pearl’s “The Book of Why”, and Jonas Peters’ mini-course on causality, for me to finally figure out why I had this lingering dissatisfaction with modern machine learning. It’s because modern machine learning (deep learning included) is most commonly used as a tool in the service of finding correlations, and is not concerned with understanding systems.

Perhaps this is why Pearl writes of modern ML as basically being “curve fitting”. I tend to believe he didn’t write those words in a dismissive way, though I might be wrong about it. Regardless, I think there is an element of truth to that statement.

Linear models seek a linear combination of correlations between input variables and their targets. Tree-based models essentially seek combinations of splits in the data, while deep learning models are just stacked compositions of linear models with nonlinear functions applied to their outputs. As Keras author Francois Chollet wrote, deep learning can be thought of as basically geometric transforms of data from one data manifold to another.

(For convenience, I’ve divided the ML world into linear models, tree-based models, and deep learning models. Ensembles, like Random Forest, are just that: ensembles composed of these basic models.)

Granted, curve fitting is actually very useful: much of image deep learning has found pragmatic use: image search, digital pathology, self-driving cars, and more. Yet, in none of these models is the notion of causality important. This is where these models are dissatisfying: they do not provide the tools to help us interrogate these questions in a structured fashion. I think it’s reasonable to say that these models are essentially concerned with conditional probabilities. As written by Ferenc Huszár, conditional probabilities are different from interventional probabilities (ok, I mutilated that term).

Humans are innately wired to recognize and ask questions about causality; consider it part of our innate makeup. That is, of course, unless that has been drilled out of our minds by our life experiences. (I know of a person who insists that causes do not exist. An extreme Hume-ist, I guess? As I’m not a student of philosophy much, I’m happy to be corrected on this point.) As such, I believe that part of being human involves asking the question, “Why?” (and its natural extension, “How?”). Yet, modern ML is still stuck at the question of, “What?”

To get at why and how, we test our understanding of a system by perturbing it (i.e. intervening in it), or asking about “what if” scenarios (i.e. thinking about counterfactuals). In the real world of biological research (which I’m embedded in), we call this “experimentation”. Inherent in a causal view of the world is a causal model. In causal inference, these things are structured and expressed mathematically, and we are given formal procedures for describing an intervention and thinking about counterfactual scenarios. From what I’ve just learned (baby steps at the moment), these are the basic ingredients, and their mathematical mappings:

- Causal model: a directed, acyclic graph
- Variables: nodes in a graph
- Relationships: structured causal model’s equations (math transforms of incoming variables with a noise distribution added on top, embedded in each node)

- Interventions: removal of edges in a graph (“do-calculus”)
- Counterfactuals: set causal model based on observation, then perform do-calculus.

Having just learned this, I think there’s a way out of this latent dissatisfaction that I have with modern ML. A neat thing about ML methods is that we can use them as tools to help us better identify the important latent factors buried inside our (observational) data, which we can use to construct a better model of our data generating process. Better yet, we can express the model in a structured and formal sense, which would expose our assumptions more explicitly for critique and reasoning. Conditioned on that, perhaps we may be able to write better causal models of the world!

*Original Source*