As the organization’s name suggests, the Human Rights Data Analysis Group (HRDAG) uses data science to answer questions about human rights on a large scale, from determining the chain of command and accountability in international cases of genocide to evaluating whether artificial intelligence-based tools used by the US criminal justice system are fair.
Because we’re data scientists, we want to do data analysis that’s elegant and brings us closer to understanding the truth about a situation, and because we’re compassionate humans, we want that analysis to be useful for effecting positive change. Data scientists want data, and this is, without doubt, the era of Big Data—we have access now to exponentially bigger datasets than ever before. As a result of this perceived abundance, we’re seeing something similar to the Gold Rush of the 1840s. Experts and industry leaders are rushing to devise ways to use the available data, and while some of it is strictly business—making difficult jobs easier, increasing revenue, and so on—some people are trying to use data to help us answer hard questions. At HRDAG, we’re very careful, and critical, about the questions. We ask ourselves constantly, “Can these data actually answer this question at hand?”
A dataset may be big, but that doesn’t mean it’s “good,” and by “good,” we mean “useful” or “appropriate.” The size of a dataset may not have any correlation to whether the data are incomplete or imperfect. With nearly 30 years of documenting and analyzing human rights violations under our belts, we are deeply informed about how unobserved events can change the conclusions drawn from existing datasets.
We’ve been involved for many years in statistical analysis that informs our thinking about the US criminal justice system. We wrote an article about homicides committed by police, estimating one-third of all Americans killed by strangers are killed by police. We’ve evaluated predictive policing tools that rely on artificial intelligence, and studied the pre-trial risk assessment tools that use existing data. Consistently, we have found that instead of cleansing the justice system of human biases, these tools perpetuate and exacerbate unfairness that’s been baked into the system—and its datasets—by decades of unjust policing practices. The datasets are only as “good” as the people and systems generating the data. So, for example, if we are trying to answer the question, “Where are the majority of drug crimes committed and by whom?”, if police officers routinely focus arrests on poor and minority neighborhoods, while ignoring the same potential arrests in more affluent neighborhoods, the data generated by the arrest records, no matter how “big,” will be biased.
This is where we find it critical to ask, “Who will bear the cost of incorrect modeling results?” As our director of research has said, “Machine learning is pretty good at finding elements out of a huge pool of non-elements… But we’ll get a lot of false positives along the way.” Ethically, we must ask ourselves, who might those false positives indict or affect?
When thinking about potential harm—and how to avoid it—we evaluate data quality, try to determine what data are missing, or unobserved, and ask ourselves if we have what we need to identify situations where analytical tools can do good. Ultimately, our goal is to supply the evidence for evidence-based policies that have the power to make the world fairer and support accountability and justice for all.
I’ll be speaking about these issues and sharing examples at my upcoming talk at ODSC West on October 30 at 11:20 AM Pacific time, “Data Science: How Do We Achieve the Most Good and Least Harm?“
Learn more about HRDAG at hrdag.org.
As the Executive Director of the Human Rights Data Analysis Group, Megan Price, PhD, drives the organization’s overarching strategy, leads scientific projects, and presents HRDAG’s work to diverse audiences. Her scientific work includes projects in Guatemala and Syria, as well as with risk assessment tools in the United States.