In US history, the period after the great depression became known as the Great Compression. It was a time when things like income inequality reduced, and households became much more prosperous and able to maneuver financially within the market.
Much like that period of history, data science is going through its own data science compression. As smaller companies grapple with the cost and labor of managing modern data, solutions are getting more manageable. Easier solutions open the market up to more than just big names with deep pockets.
Benn Stancil of Mode, in his talk “Data Science’s Great Compression and Its Next Frontier” for 2019’s Accelerate AI, outlines what this great compression means for the future of data science. Let’s take a look.
What’s Causing Compression
There are a few different reasons that compression is happening. In the past, data science happened through building bespoke infrastructure solutions to handle every aspect of the data pipeline. The sections had to be maintained by experts and built legacy code on legacy code.
Now, we have:
- New computing systems – Out-of-the-box solutions provide cheaper solutions that replace bespoke systems. You get fully managed ETL at a fraction of the cost.
- Open source tools – For data solutions, companies can turn to free algorithms and free hosted solutions. Deploy them directly on top of a data warehouse, and you’re ready. Because it’s built for you, you don’t need to scramble for top development talent.
- Cloud computing – You can import and glue together other analytics frameworks others have built.
These skills are a new type of data science skill. A little engineering and a little model development are required. However, substantial expertise in business analytics and models for deployment is even better. Data scientists must understand business systems and how to apply those solutions to the specific business initiative.
A Critical Component of Compression
As businesses work their way through these more accessible point-and-click models, one thing is abundantly clear. Each time companies build a model for processing data, an analyst must be in the room to point out flaws.
For example, Stacil mentions a company that decided to apply an algorithm to hiring decisions. They tagged promotions as a sign of a quality employee and decided to hire only junior developers because they were most likely to be promoted.
If you’ve noticed the issue with that logic, you aren’t alone. Although the company did not deploy this model (a data scientist stepped in), the implications of these types of models are clear. Without an intimate understanding of both the true business initiative (growth) and logical data science principles (models), companies could end up with poor decision-making because of flawed models.
Building on Human Expertise
The answer isn’t building more tools or complex models. Our tools are only as valuable as the people driving them, so subject matter experts must be in every room to make fair use of these tools.
New data science solutions make analysts much more valuable. They don’t have to understand engineering or developing as much as they did five or ten years ago when everyone was building bespoke solutions. Instead, employees must have the analytical expertise to build on top of existing problems to produce viable solutions.
The skillsets include analytical and business expertise. They must be good at lateral thinking and deductive reasoning. They must be good at applying existing frameworks and developing logical success benchmarks. Social scientists are good examples, Stancil says.
They may also solve other issues in the industry. A big problem in data science applications is the lack of diversity, and Stancil also notes that bringing in social scientists could help close that gap.
The Future of Data Science
We’ve built tools with a lot of potential, but these analytical experts must be in the room to realize the full potential of our tools. The ability to attract this talent into data science is a bit part of what companies must figure out in order to move forward.
If companies can welcome the analytical talent into the fold, more and more organizations may be able to take advantage of the data science compression. Big data is going to be a vital part of operations in the future. Companies who can not only gather the data but also make sense of it in practical business terms could be the ones to survive the upheaval.
According to Stacil, true movement in data science will come from marrying both the human capability and the algorithm. This combination gives models real business impact. Once businesses add analytical minds to their applied problems, they may smooth the transition to these big data solutions.