Diversity in Data Science: An In-Depth Look at the “Why” and “How” Diversity in Data Science: An In-Depth Look at the “Why” and “How”
Ask anyone for a list of “buzzwords” and diversity is bound to show up high on the list. As a society,... Diversity in Data Science: An In-Depth Look at the “Why” and “How”

Ask anyone for a list of “buzzwords” and diversity is bound to show up high on the list. As a society, we’ve been discussing it for years; even the tech and data science industries come under scrutiny. In this article, we’ll dive deep into what diversity really means, why it’s important (for society as a whole and for business managers), how it got so seemingly-bad, and how to “fix” the issue of diversity in data science.

What does “Diversity” mean? 

For us, it often means employing a range within categories like gender, race, sexuality, economic status, age, etc. Some of these are harder to track than others. In practice, companies often end up—usually accidentally—focusing heavily on one type of diversity. 

[Related Article: How Are We Solving Inequality with AI?]

For example, the crowdfunding service Indiegogo has staff made up of 50/50 male and female—compared to a 49/51 US population—and their white/non-white categories also fare pretty well (58/42 compared to 61/39 US population), however when you break “non-white” down into further categories, they’re less exact. If the goal is gender or racial diversity, they’re doing pretty well, far better than some other companies: Intel’s gender breakdown is 74/26 female to male, HP’s race breakdown is 73/26 white to non-white. These stats come from the Diversity in Tech visual created by Information is Beautiful, see their whole list and interact with the infographic here.  

Most people consider diversity as something to work towards. Those who do usually suggest that the perfectly diverse company would be one which matches the diversity of the US population (or whatever population the company belongs to and operates out of). That’s not to say having greater proportions of minority populations would be bad; for example, most tech companies’ highest hired minority (in the US) race is Asian, and they’re normally hired in proportions higher than the US average. Nor is it to say that companies who don’t “fit” the diversity model are inherently bad. To me, diversity is simply a tool to point out things (like accidental company bias, or talent) we may have been overlooking, and suffering from. 

Why should businesses care about diversity?

The most commonly cited (and most logically convincing) reason that businesses should care about diversity is that diverse companies make more money (up to 19%). With that comes the proven inferences that diversity drives innovation, and it leads to less data bias (namely because there are diverse groups of people working on the data to notice its bias ahead of its would-be disastrous debut). Furthermore, Deloitte’s Global Millennial Survey found that companies with higher rates of diversity are more attractive to Millennials (who will make up 75% of the workforce by 2025): 74% believe an organization is more innovative when it has a culture of inclusion. 

diversity in data science

All of this to say: businesses who don’t care about diversity will be left behind, society as a whole, and the individuals impacted by these companies, will suffer. There have already been cases of data bias impacting lives, namely in facial recognition software and hiring processes. But there are talks of further integrating data science into everything from the judicial system to agriculture, and if those technologies are created by companies made up of only one viewpoint, the repercussions won’t even be realized until they’ve already been implemented. And, the true successes and exciting capabilities of these products could be overlooked because of simple and preventable oversight. 

[Related Article: 9 Common Mistakes That Lead To Data Bias]

How the diversity issue came to be

One of the reasons diversity has become such a hot topic is because it’s so difficult to solve. The simplest option, to hire within a so-called diversity quota, is inherently unfair, and yet, it’s hard to suggest anything else if you want actual change. Diversity can be a hard thing to accomplish, because the lack of it doesn’t have one single root issue. HP isn’t 73% white because it’s a racist company. That would be the easy answer—angry blame is easy—but it’s more complicated than that. 

Earlier, I mentioned both that it’s hard to track the diversity in economic background, and that non-white hires are often skewed in favor of Asians, with Black and Latinx being the least hired “category” across most tech companies. However, when you consider these issues as complicated, they seem to be interwoven, and not the company’s fault. 

Black and Latinx populations are highly concentrated in low-income, highly-populated areas. This means their public schools usually get fewer resources, and they may not have as much time or income to commit to the extracurriculars that are so needed to get into good universities. And to get enough scholarships or financial aid to attend school at all, they often have to be the most impressive in their entire school (whereas more wealthy students can afford to be average). This all leads to fewer Black and Latinx data science graduates, and a smaller pool to hire from. Even with race aside, this same continuum implies there are likely fewer data science professionals who come from low economic status, since they wouldn’t have been able to afford the schooling it usually takes to get these positions. 

Likewise, gender has a similarly convoluted path. While school children and teens have a similar breakdown on their interest and success in STEM fields, girls’ interest decreases the further they are in school. Many believe this is because of societal pressures against girls in STEM and stereotypes that girls aren’t as good in the subjects. Then, while more women are enrolled in university overall, and enrollment in STEM programs specifically is increasing, these students aren’t getting hired after school, and when they do, there’s a wage gap. There’s also an unfortunate trend of women dropping out of their STEM degrees, after facing higher rates of sexual harassment. None of this is really the fault of the companies where we see a lack of diversity, but it’s not something we can, or should, overlook, either. 

What are some solutions to increase diversity in data science?

After all this discussion on the “what,” “why,” and “who,” it’s important to talk about the “how” of fixing it. There’s, by no means, a cure-all “fix” to diversity—as we’ve discussed, it’s a messy subject—there are some actionable steps your company can take to even out some of the disparity diversity in data science often faces. 

  • Measure your Diversity: The only way to know how you want to move forward and improve diversity is to see where you’re starting. Look at overall staff, “higher-ups” or leadership positions, and new hires, so you can see if there is a group you tend to promote more often. 
  • Recruit from a Broader Area: If your company is based in an expensive city, and you limit your applicant pool to that city, you’re often limiting yourself to hiring from that economic status. By broadening your search, you can reach populations and ideas that you would have been ignoring, and with the position you’re hiring them into they could have the flexibility to move closer. 
  • Offer Flexible Hours: Many parents are limited to the hours they’re able to perform, but are nonetheless qualified for your position. If you’re able to offer flexible or remote hours, you’ll not only have a happier staff, but your candidate pool will swell to include previously excluded people (specifically women, more often than not). 
  • Recruit from a Broader Education Level: Likewise to area, by limiting your recruitment to people with Masters degrees (or even higher), you’re limiting your talent pool. And with the breadth of incredible, free online or low-cost courses on data science, there’s no real argument that the candidates without formal education are any less capable. 
  • Give Diversity Training: It’s often important to train and update your current staff on new diversity procedures. Many people feel that focusing on diversity means that unqualified people are getting positions, just because they’re from a specific group, and if that idea flourishes in your company it can create a lot of hostility towards new hires. Explain that there aren’t quotas, you’re not hiring for the sake of it, and the benefits diversity brings (maybe send them this article). 
  • Create a Mentorship Program: Mentoring is often a long-standing informal part of company culture—bosses mentor people they’re impressed with. However, with companies that have started out less diverse, it can feel like an old-time-y “Boys Club” where women or minorities are excluded, and those that are mentored are the only ones who get promotions. By taking the steps to create an official program, you can make sure there aren’t any groups being accidentally overlooked for mentorship and promotions. 
  • Create an Exit Interview: New hires are obviously important for adding diversity to your talent pool, but it’s important to focus on retention and why people are leaving. If you’re constantly hiring a diverse group but they don’t last more than a few months, there’s probably still an issue, and your exit interview can give you insight on this.

Last Thoughts

[Related Article: What are Some of the Best Practices for Hiring Data Scientists?]

As easy as it can be to roll our eyes at the buzzwords that pop up so often in the media, we have to remember that they’re only buzzwords because our society has decided they’re important. Diversity has been ignored in the US, basically since the country was founded. But if we continue to ignore it, in favor of, “it’s always been this way” then we’ll never move forward. And when has it ever been like a tech company to say, “no, let’s not try changing things”? The tech industry is ever evolving, and has usually been at the forefront of societal change. It’s time diversity in data science caught up.



ODSC gathers the attendees, presenters, and companies that are shaping the present and future of data science and AI. ODSC hosts one of the largest gatherings of professional data scientists with major conferences in USA, Europe, and Asia.