Arguably, one of the biggest debates happening in data science in 2019 is the need for AI explainability. The ability to interpret machine learning models is turning out to be a defining factor for the acceptance of statistical models for driving business decisions. Enterprise stakeholders are demanding transparency in how and why these algorithms are making specific predictions. A firm understanding of any inherent bias in machine learning keeps boiling up to the top of requirements for data science teams. As a result, many top vendors in the big data ecosystem are launching new tools to take a stab at resolving the challenge of opening the AI “black box.”
[Related Article: Opening The Black Box—Interpretability In Deep Learning]
Some organizations have taken the plunge into AI even with the realization that their algorithm’s decisions can’t be explained. One case in point is the Man Group (one of the world’s largest hedge funds with $96 billion under management), initially wary of the technology’s lack of interpretability, was ultimately persuaded by the excellent returns from algorithm-centric funds. But not all AI adoption strategies culminate in rose colored returns.
In this article, I will make the case for the importance of explainable AI by examining 5 AI black box horror stories, where transparency in prediction would have helped save the day.
Many start-up companies are launching services that use AI to streamline the employee recruiting process. The technology enables hiring companies to assess many more applicants by analyzing and interpreting a huge volume of candidate data quickly and cost effectively. But caution needs to be applied here, because the basis of the selection decisions must be legally defensible. That’s not going to happen if you can’t open the black box of AI.
In addition, professional recruiters, particularly those in regulated industries such as financial services, should always know the basis for any selection decision. If it’s not possible to justify exactly why a candidate has been rejected from the application process, it leaves the company vulnerable to a legal challenge from that individual. Inherent bias in automated systems for hiring can result in a legal nightmare.
In one high profile example, Amazon developed an AI recruiting tool that went back and analyzed 10 years of employment applications in order to create a system that automatically identified characteristics of high-performing employees and against those standards, scored new candidates. The tool made headlines in 2018 when it was determined that the algorithm favored male candidates due to societal influences such as gender bias and wage gaps in technology jobs.
AI is also being used to make profound and life-changing decisions like in the judiciary system when a person convicted of a crime is sentenced based on algorithms he or she isn’t even allowed to challenge. Computationally calculated “risk assessments” are increasingly common in U.S. courtrooms and are handed to judicial decision-makers at every stage of the process. The problem here isn’t just about the government’s lack of transparency about the algorithms and methods in use but also about how to interpret and understand something that remains virtually a black box. In 2016, Propublica, showed a case where machine bias deemed a black woman more high risk than a white man, while all their previous records showed otherwise.
Recent mainstream news reports suggested that autonomous cars are unlikely to detect pedestrians crossing the road if they have dark skin, and thus run them over.
The academic paper, “Predictive Inequity in Object Detection,” by a group of researchers at the Georgia Institute of Technology, highlighted the matter with a series of experiments testing different deep learning computer vision models, such as the Faster R-CNN model and R-50-FPN, on images of pedestrians with different skin tones. The study described how the researchers enlisted the help of human classifiers to look through the collection of roughly 3,500 images, and assign labels as either “LS” for light skin or “DS” for dark skin, and then trained the neural network model using this data set. There was an attempt to ensure the manual classification process was not tainted by any cultural biases.
The group found that their models found it difficult to detect people with dark skin, which led them to the conclusion: “This study provides compelling evidence of the real problem that may arise if this source of capture bias is not considered before deploying these sort of recognition models.”
In 2015, a research group at Mount Sinai Hospital in New York worked to apply deep learning to the institution’s large database of patient records featuring hundreds of variables on patients drawn from their test results, doctor visits, etc. The so-called Deep Patient software was trained using data from about 700,000 patients, and when evaluated with new records, its accuracy in predicting disease was exceptional. The system was able to discover patterns hidden in the hospital data that seemed to indicate when patients were on the way to a wide range of ailments, including cancer.
But at the same time, Deep Patient turned out to be somewhat of a black box. For example, it was able to anticipate the onset of psychiatric disorders like schizophrenia surprisingly well. But since schizophrenia is notoriously difficult for physicians to predict, it was natural to wonder how this was possible. Unfortunately, the new tool offered no clue as to how it does this. If a technology solution like Deep Patient is actually going to assist doctors, it really needs to offer a level of transparency by offering a rationale for its predictions to reassure them that it is accurate and to justify any changes in prescription drugs a patient is taking.
The U.S. military is pumping billions into projects that are designed to use machine learning to drive vehicles, pilot aircraft, identify targets, and play assist to intelligence analysts examining mountains of data. Such applications have little room for algorithmic mystery, and the Department of Defense has identified explainability as a significant stumbling block.
In response, the Defense Advanced Research Projects Agency (DARPA), is overseeing the suitably named Explainable Artificial Intelligence program. Here, intelligence analysts are testing machine learning as a way of identifying patterns in vast amounts of surveillance data. Many autonomous ground vehicles and aircraft are being developed and tested. The problem is that military personnel likely won’t feel comfortable in an autonomous tank that isn’t capable of explaining itself, and analysts will be reluctant to act on information without some clear rationale.
Opening the black box of AI is a fertile area of research today. It’s becoming increasingly apparent that in order to realize continued and wide-spread acceptance of deep learning solutions, a much greater level of explainability is required.
[Related Article: Cracking the Box: Interpreting Black Box Machine Learning Models]
The so-called “interpretability” problem is stimulating a new cohort of researchers in both academia and industry. These researchers are creating tools that shed light on how deep learning algorithms make their decisions. A variety of approaches are being taken to address this important issue: probing the AI without penetrating it, creating alternative and more transparent algorithms that can compete with neural networks, and using still more deep learning to crack open the black box. Taken together, the new strategies add up to new disciplines some refer to as “AI neuroscience.”