

Exploring the Moral and Ethical Perspective of a Dataset while Building an Explainable AI Solution
Modelingexplainable AIResponsible AIposted by ODSC Community November 3, 2020 ODSC Community

Developing AI code in the 2010s relied on knowledge and talent. Developing AI code in the 2020s implies the accountability of XAI for every aspect of an AI project. It includes moral, ethical, legal, and technical perspectives, all for building an explainable AI solution.
[Related article: Responsible AI 2020: Expectations for the Year Ahead]
This article is an excerpt from the book Hands-on Explainable AI (XAI) with Python, by Denis Rothman – a comprehensive guide to resolving the black box models in AI applications to make them fair, trustworthy, and secure. The book not only covers the basic principles and tools used to deploy Explainable AI (XAI) into your apps and reporting interfaces, but also enables readers to work with specific hands-on machine learning Python projects that are strategically arranged to enhance their grasp of AI results analysis.
Developing AI programs comes with moral, ethical, and legal responsibilities. Ignoring ethical and legal liabilities can cost millions of any country’s currency to any company, government institution, and individual. In this article, we will focus on the moral perspective of a dataset.
Dataset
Let’s assume that your AI agent using the US census dataset and faces a tough decision on using all of the features there as they are, or not.
You now team up with a few developers, consultants, and a project manager to analyze the potentially biased columns of the dataset.
After some discussion, your team comes up with the following columns that require moral and ethical analysis:
Some of these fields could shock some people at the least and are banned in several European countries such as in France. We need to understand why from a moral perspective.
The Moral Perspective
We must first realize that an AI agent’s predictions lead to decisions in some form or another. A prediction, if published, would influence the opinion of a population on fellow members of that population, for example.
The U.S. census dataset seemed to be a nice dataset to test AI algorithms on. In fact, most users will just run focus on the technical aspect, run it, and be inspired to copy the concepts.
But is this dataset moral?
We need to explain why our AI agent needs the controversial columns. If you choose to include the controversial columns, then you must explain why. If you decide to exclude the controversial columns, you must also explain why. Let’s analyze these columns:
Workclass: This column contains information on the person’s employment class and status in two broad groups: the private and public sectors. In the private sector, it informs us whether a person is self-employed or not. In the public sector, it tells us if a person works for a local government or a national government, for example.
When designing the dataset for the AI agent, you must decide if this column is moral or not. Is this a good idea? Could this hurt a person who found out that they were being analyzed this way? For the moment, you decide!
Marital-Status: This column states whether a person is married, widowed, divorced, and so on. Would you agree to be in a statistical record as a widow to predict how much you earn? Would you agree to be in an income prediction because you are “never-married” or “divorced”? Could your AI agent hurt somebody if this was exposed, and the AI algorithm had to be explained? Can this question be answered?
Relationship: This column provides information on whether a person is a husband, a wife, not in a family, and so on. Would you let your AI agent state that since a person is a wife, you can make an income inference? Or if a person is not in a family, the person must earn less or more than a given amount of money? Is there an answer to this question?
Race: This column could shock many people. The term “race” in itself, in 2020, could create a lot of turbulence if you have to explain that since a person is of such a color, this is a race, although races do not exist in the world of DNA. Is skin color a race? Can a skin color decide how much a person will earn? Should we add hair color, weight, and height? How will you explain our AI agent’s decision to a person offended by this column?
Sex: This column can trigger a variety of reactions if used. Will you accept that your AI agent makes predictions and potential decisions based on whether a person is male or female? Will you let your AI agent make predictions based on this information?
This section leaves us puzzled, confused, and worried. The moral perspective has opened up many questions for which we only have subjective answers and explanations.
Let’s see if an ethical perspective can be of some help.
The Ethical Perspective
The moral perspective of the critical columns of the previous section has left us frustrated. We do not know how to objectively explain why our AI program needs the preceding information to make a prediction. We can imagine that there is no consensus on this subject in the team we imagine working on this problem.
Some people will say that employment status, marital status, relationship, race, and sex are useful for the predictions, and some will disagree.
In this case, an ethical perspective provides guidelines.
Rule 1 – exclude controversial data from a dataset
This rule seems simple enough. We can exclude the controversial data from the dataset.
We can take the controversial columns out, but then we will face one of two possibilities:
- The accuracy of the predictions remain sufficient
- The accuracy of the predictions are insufficient
In both cases, however, we still need to ask ourselves some ethical questions:
- Why was the data chosen in the first place to be in this dataset?
- Can such a dataset really predict a person’s income based on this information?
- Are there other columns that could be deleted?
- What columns should be added?
- Would just two or three columns be enough to make a prediction?
Answering these questions is fundamental before we can engage in an AI project for which you will have to provide XAI.
We are at the heart of the “what-if” philosophy of Google’s WIT! We need to answer what-if questions when designing an AI solution before developing a full-blown AI project.
If we cannot answer these questions, we should consider rule 2.
Rule 2 – do not engage in an AI project with controversial data
This rule is clear and straightforward. Do not participate in an AI project that uses controversial data.
We have explored some of the census data problem issues from moral and ethical perspectives. Let’s examine the problem from a legal perspective.
Summary of Building an Explainable AI Solution
In this article, we explored the moral and ethical perspective through an example dataset from the U.S. census. While exploring those two perspectives, we learned the significance of analysing your AI solution building process through the filters of ethics.
About the Author
Denis Rothman graduated from Sorbonne University and Paris-Diderot University, writing one of the very first word2vector embedding solutions. He began his career authoring one of the first AI cognitive natural language processing (NLP) chatbots applied as a language teacher for Moët et Chandon and other companies. He has also authored an AI resource optimizer for IBM and apparel producers. He then authored an advanced planning and scheduling (APS) solution that is used worldwide. Denis is an expert in explainable AI (XAI), having added interpretable mandatory, acceptance-based explanation data and explanation interfaces to the solutions implemented for major corporate aerospace, apparel, and supply chain projects.