Dealing with the Incompleteness of Machine Learning
Machine LearningModelingWest 2020posted by ODSC Community October 19, 2020 ODSC Community
The prospect of automating every aspect of human life is exciting. Imagine humans permanently living a life of leisure and machine learning robot labor picking up the slack! Even though this sounds like a recipe for lazy and depressed humans, we can still be useful to each other by building communities surrounded by interests and companionship. And gaining fulfillment that we used to get through employment through collaboration, learning, and innovation.
It also seems like this imagined future is around the corner. In the last century and a half, we have automated most of the manual labor. Automating cognitive tasks is what naturally comes next. Essentially this means making decisions for us. We trust A.I. and machine learning to take care of this – at least piecemeal – till we get Artificial General Intelligence. However, it’s not as simple as it seems.
Let’s examine how a human makes decisions. Whenever it’s deliberate, it follows a logical sequence of seven steps. Perhaps you don’t follow these steps for trivial choices, but you would expect your doctor, lawyer, or portfolio manager to follow them when making decisions on your behalf.
We know that however carefully considered the decision was, a doctor can misdiagnose, a lawyer can pick a faulty defense strategy, and a portfolio manager can make the wrong investments. Human decision making is flawed, no doubt, but a human is accountable for any mistakes, can second guess their decisions or deliver it with hesitation, and explain their reasoning.
Can you expect the same from a machine learning algorithm?
Not really, but with all the hubris surrounding A.I., you would think so.
When we discuss a Machine Learning model’s decisions, we generally focus on the very last step: Inference. The decisive inference step is when the model has already been deployed, and it’s being used in real-world applications. However, decisions were made before that. Not by the model but by a human! And because of this, there’s potential for bias in every previous step:
- What part of the problem to focus on?
- What data to use?
- How to prepare the data?
- What model classes to use?
- What evaluation metrics to use?
- And how to deploy the model?
The many untraceable decisions that were made before and not communicated or even understood create an accountability gap!
The Confidence Trap
Also, the machine learning model doesn’t second guess any decisions. They are final. Even when a probability or confidence band is surrounding a prediction, they are often ignored and not communicated to the end-user, making it deterministic.
Explain Your Reasoning
The model cannot do this. But as machine learning practitioners, we can do this for the model.
Machine learning interpretation is needed because machine learning by itself is incomplete as a solution. Think about it. Simple problems could be likely solved with a flowchart or procedural programming. The solution covers all of it.
The complex problems we optimize with machine learning require linear algebra, calculus, and statistics precisely because we don’t understand all of the problem. For instance, what does cancer look like on an x-ray? How could we even begin to describe all the many ways you can detect cancerous growth in an x-ray?
By explaining a model’s decisions, we can cover gaps in our understanding of the problem – it’s incompleteness. One of the most significant issues is that given the high accuracy of our machine learning solutions, we tend to increase our confidence level to the point we fully understand the problem. Then, we are misled into thinking our solution covers ALL OF IT!
Serg Masís has been at the confluence of the internet, application development, and analytics for the last two decades. Currently, he’s a Data Scientist at Syngenta, a leading agribusiness company with a mission to improve global food security. Before that role, he co-founded a search engine startup, incubated by Harvard Innovation Labs, that combined the power of cloud computing and machine learning with principles in decision-making science to expose users to new places and events efficiently. Serg is passionate about providing the often-missing link between data and decision-making. His book titled “Interpretable Machine Learning with Python” is scheduled to be released in early 2021 by UK-based publisher Packt.
The path to a job in data science may vary. With the Ai+ Training Platform, you gain access to our massive library of data science training courses, workshops, keynotes, and talks. All skills are ideal for those looking to break into the field or to acquire the latest skills needed to get ahead. Some highlighted courses include:
SQL for Data Science: Mona Khalil | Senior Data Scientist | Greenhouse
Data Science in the Industry: Continuous Delivery for Machine Learning with Open-Source Tools: Team from ThoughtWorks, Inc.
How to do Data Science with Missing Data: Matt Brems | Managing Partner, Distinguished Faculty | BetaVector, General Assembly
Continuously Deployed Machine Learning: Max Humber | Lead Instructor | General Assembly