Be sure to attend Olivier’s talk at ODSC East this April 30 – May 3 in Boston! Register now for “Democratizing Artificial Intelligence in a Business Context” now.
When we speak about data science in general, one of the biggest problems is the sheer lack of data scientists. According to a study published by McKinsey Global Institute, the U.S. economy could be short as many as 250,000 data scientists by 2024. That’s A LOT of people. And this is, of course, if we continue to grow at a steady rate.
Why exactly are data scientists so critical and scarce in today’s businesses? The definition of a data scientist, according to DJ Patil, the former US Chief Data Officer, could provide us with some insights into the issue:
“A data scientist is that unique blend of skills that can both unlock the insights of data and tell a fantastic story via the data”
DJ Patil, the former US Chief Data Officer
Although this definition is clear and succinct, it also identifies all the problems leading to the data scientist’s shortage.
The 4 Commandments on How to Avoid a Data Scientist Shortage
We Shall Not Keep Our Skills to Ourselves
Keeping this unique blend of skills to ourselves, in the long run, hurts us more than it helps. It’s like any other technical skill; if you keep to yourself and refuse to share your knowledge, you end up working alone.
The data culture in your company will stay as it is so long as your team is seen as an isolated part of your organization.
In order for a company to become more data-driven and to build a data science culture, we need to be generous with our knowledge. If we don’t, data science cannot grow organically within the company and will continue being a niche for specialists only. Remember what your mothers told you – sharing is caring.
We Shall Not Become a Jack of All Trades, Masters of None
A “unique blend of skills” suggests a lack of specialization, which can cause problems down the road. For example, a lack of business skills can cause us to work hard and long hours to solve the wrong problems.
On the other hand, a lack of statistical skills can cause biased conclusions, leading our business in a complete erroneous direction. In order to become a proficient data scientist, it is imperative that our overall business intelligence and our more technical skills stay sharp and specialized to the tasks at hand.
Even the Einsteins of data science would be left helpless without proper knowledge of the domain in which they are operating.
We Shall Not Define Standards
There are no standards or criteria that make a great data scientist. It’s based on the type of work to achieve. The blends of skills are so unique to every opportunity that finding which data science professional will be best suited for a particular task is becoming a real issue.
The proof is in the pudding. You need to hire a data scientist who has solved similar issues to your own. Are you willing to hire junior professionals? Even junior data scientists should have practiced on generic datasets.
Lack of practice should no longer be an excuse thanks to services like Kaggle and Google Datasets.
Moreover, it is easy for data analysists and other data professionals to call themselves data scientists in order to surf the hype wave. Don’t get me wrong, all of the data science professions are vital in a data-driven business, but it’ll inevitably hurt your culture and performance if you have the wrong employee profile in the driver’s seat.
We Shall Not Take All the Heat (It’s Their Fault)
The fact that organizations often blindly trust data scientists to unlock insights and to provide business direction is plain, old fashion, risky! Would you bet your company’s future on one group’s analysis? Nope! Me neither.
A good data science project is a business project first. This means that, without business vision and an understanding of the business processes, the project might not even be useful at all.
Therefore, without executive and business support, a data science project might not be relevant to the business in the slightest.
However, once management and the business are onboard, if the data department is not involved in the implementation, things may be limited to a statistical model on a computer using extracted data!
Typically, to put a data science project in place, you have to connect to the data sources in real-time or in frequent batches, develop an API to run the statistical model on the new data, and push the API’s predictions to a separate system, such as a CRM or ERP. Sometimes, these preliminary steps can take more time than the rest of the data science project.
Guidelines to Operationalize Data Science in Your Company
Now that you understand the commandments, here is the bible.
It is completely fine – even ideal – to have a data scientist in your company. Data scientists bring innovation and a different set of skills compared to other roles. However, this role might need to change over time.
A data scientist should be more efficient working in an innovation team, helping to execute with proof of concepts. It is well-known that data scientists tend to seek diversity and for project ownership. Also, the data scientist needs to train the organization on how to become more data-driven in general.
A common way to make this happen would be to develop a Center of Excellence or a Data Science Practice. This will enable other analysts and data developers to be proficient in data science in their own teams, while working on innovation projects.
[Related article: Give Unicorns a Break, It’s Time for Data Science Teams]
Don’t Fear the Business Analyst
For the regular day-to-day data science operations, business analysts should get the bulk of the responsibilities. Even if they might lack certain statistical concepts and coding principles, many tools and training are available that can improve their skills to ensure the job can get done.
However, the BA’s business background and network within the company gives them an advantage – the ability to kick start projects with ease. With all the existing tools, I find that the toughest responsibility in data science is to correctly understand the problems at hand.
Albert Einstein once said:
“If I were given one hour to save the world, I would spend 59 minutes defining the problem and one minute solving it.”
Fortunately, one of the best traits of the business analyst is to be good at solving problems.
Deploy Machine Learning with Your Actual Developers
For more complex algorithms and machine learning deployment, things should be left to your more seasoned developers. Keras, Fast.ai, AutoML, and other solutions are game-changers, as they are easy to understand and optimize without in-depth knowledge in linear algebra or statistics.
As long as the right methodologies and techniques are used, the results will be quicker, will consist of higher quality code, and will be easier to monitor and maintain. Also, it is important to know when using a machine learning algorithm and when other, more primitive methods could be used. Developers should be able to use the right tools at the right times.
A common pitfall that can be avoided by working with developers is to declare boundaries for the data science project.
For instance, if the statistical model performs poorly in some cases, instead of working on it for months to make it generalized (sometimes it is simply not possible), a developer might instead create different flows to deal with the uncertainty in the result.
Data Science is a Team Sport
What I mean to say in this article is that data scientists are currently being misused in a lot of companies. They are perhaps the only ones who understand the full process of insight generation and machine learning pipeline creation.
Expecting tasks that are out of a data scientist’s scope can create a lot of stress and frustration, and can lead to a higher churn rate.
To become less dependent and more efficient, it will be important to take these data science projects seriously, by having a proper structure in place, backed by management support, and by the overall business’s interest. If these ingredients are present, you will most likely have quick project adoption.
Moreover, the data science practice needs to have business analysts, developers, and data developers, integrated with business units and IT functions to make the most out of each project.
Editor’s note: Be sure to attend Olivier’s talk at ODSC East this April 30 – May 3 in Boston! Register now for “Democratizing Artificial Intelligence in a Business Context” now.