Editor’s Note: Kush R. Varshney is a speaker for ODSC East 2022. Be sure to check out his talk, “A Unified View of Trustworthy AI with the 360 Toolkits,” there!
As artificial intelligence (AI) systems are increasingly used to support consequential decisions in high-risk applications such as health care, educational assessment, criminal justice, employment, and lending, it has become apparent that traditional predictive performance indicators such as error rate and F1-score fall short in capturing all of the important considerations that these applications require. Beyond being accurate, AI systems must also be fair to different groups and individuals, robust to changing conditions, secure from tampering and unwanted disclosure of private information, transparent in how they make their inferences, communicative of their own limitations, and be aligned to beneficent societal values. Together, these are the considerations for making AI trustworthy. There are numerous approaches for working toward these goals, and there is no one best approach independent of the problem and application domain.
Why is that? There are several reasons.
- Due to the ‘no free lunch theorem’ of machine learning, one algorithm is not always the best in terms of basic predictive performance. It depends on the characteristics of the problem. As Kevin Murphy writes: “As a consequence of the no free lunch theorem, we need to develop many different types of models, to cover the wide variety of data that occurs in the real world.” Since fairness, robustness, explainability, uncertainty quantification, and privacy are more advanced concepts that build upon machine learning, the same principle holds. We need a variety of algorithms for them.
- There are different points in the pipeline that we can intervene upon to improve on the different dimensions of trustworthy AI: (1) pre-processing the training data, (2) adding constraints to the learning process, and (3) post-processing the output predictions. Each has different requirements based on what is known and what can be changed.
- There are different notions and definitions of metrics for each of the dimensions. For example, there are several fairness metrics; the most appropriate depends on the details of the application such as whether there is structural social bias or not, and whether the decision is helpful or hurtful (being hired vs. being fired). There are no settled-upon quantitative metrics for explainability (and there can’t be), but different proxies are more and less useful in different contexts. Some trustworthy AI algorithms are better suited to specific notions than others.
Given that there is no free lunch, or no one-size-fits-all solution, it is imperative that there be comprehensive toolkits of diverse algorithms to address the various pillars of trustworthy AI. Only then will data scientists have the flexibility to choose the right algorithms for their applications. Toward this goal, our team at IBM Research has created the open-source toolkits AI Fairness 360, AI Explainability 360, Adversarial Robustness 360, Uncertainty Quantification 360, AI Privacy 360, Causal Inference 360, and AI FactSheets 360, a few of which are now governed by the Linux Foundation AI. We have also continued to add advanced capabilities to inner-source versions of the toolkits that we license to customers through an early access program.
But a toolkit approach leaves us with a conundrum: how do data scientists select among the plethora of algorithms and configure the toolkits for the specific problem they have in front of them. That is partly something to be done through a participatory design session with a group of diverse stakeholders, guided by appropriate questions to be deliberated. It is also something that involves some amount of quantitative testing of approaches, perhaps facilitated by automation. This initial problem specification and configuration step is still something we need to come up with better ways of doing. Initial thoughts are only just starting to emerge.
The toolkit approach is a necessary ingredient for trustworthy AI, but is not sufficient. I am confident, however, that we’re making rapid progress in rounding out the design and consulting processes needed to complete the story.
About the author/ODSC East 2022 Speaker:
Dr. Kush Varshney is a distinguished research staff member and manager with IBM Research at the Thomas J. Watson Research Center, Yorktown Heights, NY, where he leads the machine learning group in the Foundations of Trustworthy AI department. He was a visiting scientist at IBM Research – Africa, Nairobi, Kenya in 2019. He is the founding co-director of the IBM Science for Social Good initiative. He applies data science and predictive analytics to human capital management, healthcare, olfaction, computational creativity, public affairs, international development, and algorithmic fairness, which has led to recognitions such as the 2013 Gerstner Award for Client Excellence for contributions to the WellPoint team and the Extraordinary IBM Research Technical Accomplishment for contributions to workforce innovation and enterprise transformation. He conducts academic research on the theory and methods of trustworthy machine learning. His work has been recognized through best paper awards at the Fusion 2009, SOLI 2013, KDD 2014, and SDM 2015 conferences and the 2019 Computing Community Consortium / Schmidt Futures Computer Science for Social Good White Paper Competition. He self-published a book entitled “Trustworthy Machine Learning” (http://www.