In this article, I will dig deep into my years of experience as a tech journalist and practicing data scientist and reflect on numerous conversations I’ve had with companies about their data science projects in order to identify what I’ve seen as the top reasons why many projects fail. The short list below consists of some of the top factors that can lead a project down a rabbit hole. When you start a new project, have this list handy so you’ll avoid the mistakes made by others.
- Asking the wrong questions – It’s a good idea to initiate a data science project with an established goal that leads to creating concrete business value. Further, you should begin with a specific set of clearly defined questions that point to which data should be analyzed. This targeted methodology serves to streamline the data science process by pairing business validation with business action. It also directs company resources to the data most likely to produce reliable and important findings. Data science projects starting with the right question sets the stage for success through increased accuracy and efficiency, resulting in focused insight.
- Lack of firm support by key stakeholders – Data science projects often impact many departments across the enterprise. Without the support and commitment of key stakeholders to implement changes, projects could be hindered or fail outright. A straight path toward ensuring business alignment across the organization is to create a well-defined data strategy and path to keep the project on track. Stakeholders should be committed to the project’s goals and follow through on implementation in their department when the time comes. When stakeholders understand the value in a technology initiative supported by a specific business use case, the likelihood of project failure caused by stakeholder apathy greatly diminishes.
- Data problems – Poor data quality and accuracy can be a major obstacle to the success of a data science project. Many data scientists feel that the quality of data provided to them is inadequate and that the available data is incomplete and/or inconsistent. Often, data scientists need to turn to third-party service providers to help fill gaps in the data provided. Although data quality is frequently identified as a critical issue, very few enterprises address the problem and take a proactive approach to address the need ahead of time. Another data-related problem is that companies often manage data at a local department or location level. This results in so-called “data silos” where data is redundantly stored, managed and processed. A data science project often is stalled due to data problems, i.e. poor quality data that is not identified and corrected up front can have a significant negative impact on a data science project.
- Lack of the right data science “team” – The most successful data science projects employ team members possessing a range of skill sets including pure data scientists for EDA, data modeling, determining model performance, and storytelling; along with data engineers for data acquisition, ETL, and production deployment. The team should also include subject matter experts from departments dependent on the initiative’s question and focus. Together, a well-honed team is able to bring different perspectives, and experience to mold the project’s objectives and direction as it move forward. It’s also beneficial to have a team member who understands internal business operations to ensure the project remains aligned with original business goals. More eyes on the project increases the chances that mistakes will be identified while leveraging the collective knowledge and talents of the entire team. Failure to put together the proper data science team will increase the likelihood of a failed project. Similarly, relying on a single “unicorn” data scientist opens the project up to peril if that person should no longer be available.
- Overly complex models – Data scientists often tend to create complex models when a simple one can just be as good or at times even superior. There’s frequently an inclination to complicate the problem statement and create solutions that are similarly complex. This practice just takes away focus from the big picture and diverts from the correct solution.
- Over-promising – A successful data science project is one that offers a substantial financial or technological ROI. Often these investments come with oversized expectations especially if C-level folks are promised that the technology solutions you developed and the team you put together will get your company a larger market share. Failure to meet such promises will not be well-received and can jeopardize the whole project. Everybody agrees that broader performance improvements from large-scale investments in technology often don’t appear right away, and require sufficient time, and constant refinement.