Why Your Machine Learning Project Could Fail Why Your Machine Learning Project Could Fail
When I started my career in data science, I believed that the most difficult part of a project was doing the... Why Your Machine Learning Project Could Fail

When I started my career in data science, I believed that the most difficult part of a project was doing the actual work. I was wrong. Most advanced analytics and machine learning projects fail or don’t reach the production state. The reasons for failure range from organizational to operational. We’ll cover both in this article.

While the ability to perform hands-on work is fundamental for the success of the project, an often overlooked aspect has to do with client communication and understanding expectations. As technical people, we often focus on doing the actual work instead of curating human-to-human conversation: we delegate (when possible) the emails, meetings, and conversations with the clients to peers that cover specific roles like account and project managers. This formula is used because it works in most fields, but I dare to say it doesn’t work that well in data science and ML for several reasons.

I will write about why communication and setting expectations are so important and how they can set the ground for failure right from the get-go if poorly considered. We’ll touch upon project management topics and go deeper into some specific points that are crucial for the outcome of a data science or machine learning projects.

1. How to Frame Problems

As I mentioned, most problems fall into two categories: organizational and operational problems.

Organizational aspects have to do with how people manage their projects, team, communication, workflow, and more. It has to do with people, not with tech directly.

Operational problems have to do with how people do stuff. It’s the badly written code, it’s the missing comments, it’s the inability to understand the models’ performance metrics.

Some problems do not fall into distinct categories — they can also fall in both.

2. A Data-Savvy Person Should Manage the Project

Understanding what kind of solution is best for a specific problem is key to delivering work that pleases the client. Whoever talks to the client must be able to provide clear, understandable information about the solution to solve the client’s needs. This person must be able to receive the client’s brief, ask the correct questions, and talk to the data science team to understand and frame the problem.

More often than not, a person that only has a high-level view of the field cannot ask the right questions or frame the problem correctly. This could lead to problems down the road that can impact the final delivery or the client’s trust in our firm, like delivering an inadequate model or missing deadlines. For this reason, communication and project management should be done by someone who has advanced data science knowledge.

3. Get to Know your Client

A data-savvy person managing the project should always invest time and get to know the client on a deeper level. Because we get paid, we often behave like the client is always right and has an understanding of the issue he’s trying to solve. This is very, very wrong.

The client refers to us because he has no idea how to fix his problem. It is up to us to understand what they’re trying to say, what they’re experiencing, and how to solve it, and often the client won’t communicate it in the clearest of ways because they’re not in the technical or data field.

Whoever is in charge of communication is a parser and a translator of intentions and expectations.

4. Lack of Experienced Talent

Data science and machine learning are relatively new disciplines. While many are building knowledge and new tech, they are doing so by testing and learning using their own methods. There’s still no industry standard, and there won’t be for a long time still.

Most people today have superficial knowledge about fundamental topics like linear algebra, calculus, and algorithms. A very small proportion of people that studies advanced analytics actually has hands-on experience at the workplace. This forces firms to “make do” of problems that could be solved with more efficient solutions.

This doesn’t just apply to data scientists. The entire project’s life cycle depends on multiple teams interacting with each other, for instance, the data science team with the software engineering team. Most software engineers have little knowledge of how an ML project is structured, and this can negatively impact the success of your campaign.

5. Bad Communication Among Peers and Lack of Collaboration

Often, the problem lies in the inability of the analysts to communicate the results to the upper management. This is often the case when firms are not vertical in data analytics and just cover the service as some sort of addendum to their other services.

Ideally, the average analyst should have a degree of proficiency in storytelling and data visualization, while the stakeholders should have some degree of technical knowledge.

Another problem that could impact the delivery is team leadership. Projects should be led by people with experience in that field and possibly in that niche. For instance, if a senior data scientist has experience in the food industry, it makes sense to have them lead a project in the same niche.

Lack of good leadership can be devastating, and hurts the team and the members in it on multiple levels:

  • unclear goals leads to unclear expectations
  • inconclusive experimentation leads to waste of resources
  • the perception of the team, from within the team itself and the outside, is negative
  • doubt pervades the member’s minds

Make a priority to have a valid person leading your team — you won’t regret it.

6. Missing Data Infrastructure

We data scientists have a huge problem: we can’t work without data. It’s even worse when we are given bad data from our clients. As you can imagine, bad data infrastructure is a real culprit of the inability of the data scientist to deliver a usable model.

If you own a data science consultancy firm, make sure you can work with your client’s data before accepting the job.

7. Technically Unfeasible Projects

There are simply some projects you cannot complete. If your sales representative sells services you cannot cover, then there’s a big problem in your firm. If you fall for this trap, it’s on you completely. That is why you need data-savvy people in your team, starting from sales. When this happens it means that there is a lack of knowledge and alignment among teams, as well as an inability to cover specific services. “Machine Learning” and “Data Science” are buzz words at the moment — don’t let the client decide what solves their own problem. If they knew, they wouldn’t have the problem in the first place.

Unrealistic KPIs, random Gannt charts, and unrealistic promises break client relationships and waste tons of resources, all at your firm’s expense.

Conclusion on Analytics and Machine Learning Projects failures

To recap, here’s an (incomplete) list of threats that could disrupt your machine learning projects and have you waste tons of resources

  • whoever is managing the project is not knowledgeable about data science and cannot communicate efficiently with the client
  • the client is taken too seriously (or the opposite) — always remember that you are the expert and the client doesn’t have a clue on how to solve their problem. If they did, they’d be done by now
  • Lack of experienced talent in the team
  • Bad communication and inability to collaborate efficiently among and within teams
  • The client has no data or their infrastructure is poorly managed
  • accepting technically unfeasible projects

I’ve personally experienced all of these, sometimes even more than one together. Luckily I was able to learn from each experience and improve little by little my understanding of how business should be handled in this fast-paced field. I hope you all do the same.

Best of luck.

Andrea D'Agostino

Data scientist. 5+ years of experience in digital marketing. Signal detection, data mining, conversion optimization.