This blog post is an excerpt from the O’Reilly report: Leading Data Science Teams.
The success of your data science team entirely depends on how well the people on it can fulfill their roles and provide value. Value is a difficult term to define because with data science, it can be so abstract and disconnected from the amount of work put in. A summary graph of data that takes five minutes to make one day might end up being more valuable to an organization than a complex machine learning model that a whole team spends months on. Because of that, let’s explicitly define “contributing value” as the following:
A data scientist is contributing value to the team if they are creating and finishing code, reports, or other deliverables that:
- Solve the requested task
- Are straightforward to maintain
- Are quick to develop
In doing so, the data scientists should:
- Communicate issues and concerns as they arrive
- Promote a healthy working environment
But that definition above is surprisingly difficult to achieve!
It’s hard to solve a requested task
Usually, a data scientist is given an abstract business objective, such as “use data to find out why new customer growth is down” or “use machine learning to predict if a customer will use a coupon.” These are business objectives in that completing the tasks will help the business, but they don’t provide clear data science steps to achieve them. In the first example, there may be many different ways to define “new customer growth”—you could use the number of people who make a new account or, alternatively, a first purchase. There are even more ways to run analyses on that data. It’s not obvious for a data scientist what approach is the best one to be taken, and different types of analyses have trade-offs for the business.
There are many reasons why a data scientist might not be able to fulfill a request as given, including:
- The stakeholders don’t understand their own needs, and so the request is ill-defined.
- The data is too noisy to be useful.
- The answer the data provides goes against the political tides of the organization, and sharing it with the organization could be disastrous.
As a leader, it is your job to help resolve these situations. You are the person who bridges the gap between what a request is loosely asking for and what a data scientist can actually achieve. You should ensure the data scientists have the best understanding of what the stakeholder needs truly are, and you should have a perspective on what the data science team is capable of handling.
It’s hard to make deliverables straightforward to maintain
When a data scientist creates an analysis, a machine learning model, a dashboard, a code package, or pretty much anything, they are creating a deliverable. As part of the work, there is an understanding that the deliverable will exist beyond the moment it’s created. Sometimes this is explicit: a dashboard is expected to be continuously viewed, and a machine learning model is expected to be rerun with similar accuracy. Sometimes this is implicit: an analysis may have been done as a one-off request for an executive, but the executive may periodically re-review the results or even ask for it to be updated with new data.
The deliverable itself has to be maintained (the dashboard has to be always up and running, the model has to be consistently accurate), and the method of creating the deliverable has to be maintained (the code for analysis needs to be saved in case people have questions about how it was run). If data scientists are not putting effort into making sure their code and output is easy to maintain, over time you’ll find your team’s output getting slower and slower. Unmaintainable code and deliverables are a form of technical debt, and you need to keep an eye on it.
It’s hard to make deliverables quickly
Compared to other fields such as software engineering, it’s hard to tell in data science when something is considered done. In most cases, there is no clear definition of what “done” is: features into a model can always be further adjusted, analyses can have more and more ways to slice data, and there is always another possible machine learning framework to try. That said, for data science work to be useful, it has to be delivered to a stakeholder to use it, and sooner is better.
This is made worse by the earlier point that the desired outcome from the stakeholder may not be clear and achievable. If a data scientist is tasked with making a model to predict if a customer will use a coupon and every model the data scientist has tried to make hasn’t worked, how is the data scientist able to know if one more try would work or if no model would ever work on that data? The work of a data scientist is to constantly try approaches until one works, but they have no way of knowing if any will ever work. So a data scientist has to balance “let’s try more things” with “this is good enough and let’s deliver quickly” without knowledge of whether or not more things would work better or even what good enough is.
Practically, a data science leader should be constantly keeping an eye on how work is progressing and if things seem off track. You may have a situation where each morning you ask a data scientist how the work is progressing, and each morning they say, “I’m still selecting features for the model.” It’s hard to know at what point that goes from an acceptable amount of feature selection to a sign that the project won’t be completed in time. As a leader, you have to find a balance where the data scientists feel like they understand the amount of time available and your expectations of them without feeling like you are telling them how to do their jobs.
Data scientists should communicate issues and concerns
For a data science team to succeed, the data scientists need to be able to communicate when situations arise. Issues crop up all the time on data science projects—from small issues like libraries being incompatible with each other to major ones like essential data missing. A data scientist needs to be able to effectively flag these situations and let the stakeholders and leadership know.
A stakeholder can be dangerous for a project if they feel like they should be providing technical guidance, like asking the data science team to use a specific type of model. The data scientists are almost always the experts on the technical solution, and an overly specific stakeholder making technical requests limits what the data scientist can provide as a solution. On the other hand, data scientists can be overly specific with their opinions on how the business should be run. The data scientists should be providing technical expertise and general recommendations for the business, but ultimately it’s the stakeholders’ responsibility to choose what to do with it.
Promote a healthy working environment
It’s not enough for a data scientist to be delivering consistently successful work: they also need to help create a culture that lets their teammates thrive and encourages their stakeholders to work with them. Without a healthy working environment and a team-centric culture, even a perfect machine learning model or an outstanding analysis won’t have the ecosystem around it to be useful. This means:
- Data science teammates that support each other. The data scientists should be working together on projects, reviewing each other’s code, and listening to feedback.
- Data scientists that can talk to you. Data scientists should be able to express concerns and opportunities to their leaders and have their concerns be listened to.
Putting it all together
So as you can see, a lot of distinct factors go into having your data science team provide actual value. As a data science leader, keeping an eye on all the distinct components in delivering value and providing your team with the support they need is likely the majority of your job on a day-to-day basis! As a leader, you have the ability to create a culture that keeps stakeholders in the loop and promotes creating valuable deliverables with minimal tech debt and in a way that lets the data scientists feel they can communicate. The best way to achieve this is through diligent eyes on the team, noticing when data scientists are struggling and thriving, and stepping in where appropriate. If you find that your data science team currently has a culture of ignoring these principles and instead has one of “just do exactly what you’re told and stay quiet,” it’s still possible to turn it around by setting an example yourself.
With a data science team that has a strong culture and the ability to get work done, the next step to think about is how to manage that team’s relationship with its stakeholders. In the next chapter, we’ll discuss how to work with many different types of stakeholders as a data science leader.
If your team is looking for an easy solution for collaboration, scaling, and more, try Saturn Cloud for free. Your data science team has very specific ways of doing work and you need a platform flexible enough to handle them. With Saturn Cloud, your team can collaborate together and run your code at scale. Seamlessly integrate with your existing environment and jump into features like cloud-hosted JupyterLab and RStudio, Python and R support, and more.