Data Science Teams on the Rise, Give Unicorns a Break Data Science Teams on the Rise, Give Unicorns a Break
We’ve all heard of the highly sought-after breed of data scientist described as a unicorn. Many companies continue to seek for these mythical creatures.... Data Science Teams on the Rise, Give Unicorns a Break

We’ve all heard of the highly sought-after breed of data scientist described as a unicorn. Many companies continue to seek for these mythical creatures. When building up your company’s data science capabilities, should you hold out for a unicorn or create groups of specialists who can work together as a team to extract insights from valuable enterprise data assets? In this article, we’ll discuss how and why creating data science teams makes the best sense.

It appears that many HR departments, hiring managers, and recruiters are still using the brute force methods for finding data science talent. Just take a look at some recent job ads for the position “data scientist” and you’ll find what looks like an amalgamation of leading industry keywords. It’s almost as if the recruiter Googled “data science” and copy/pasted every keyword they could find. This method is a disservice to both the hiring company and the candidate because individuals with such a broad level of knowledge and experience, aka unicorns, are exceedingly rare. I’ve seen some companies keeping the same position open for over a year in hopes of finding that one special person.

It is enticing to believe that you can find a single person who has a background in math, statistics, computer science; the ability to write production-level code; and the ability to talk to business people in their own language. It is far more realistic to understand that building a team of people with complementary skill-sets is actually a more straightforward approach in the long run.

Here is a list of components and associated skills for a proposed data science team consisting of both pure data scientists and data engineers:

  • Analytics – statistical analysis in R, Python
  • Coding – R, Python, Java, C/C++
  • Database, data warehouse, data lake management – SQL (enterprise-class databases), NoSQL (MongoDB, Couchbase, etc.), data transformation (wrangling) experience, back-up & recovery.
  • Machine Learning – algorithms and models. Supervised learning algorithms such as regression, classification, ensemble methods.  Unsupervised learning algorithms such as clustering, PCA.
  • Frameworks and libraries – scikit-learn, TensorFlow, Caffe, CNTK, Theano, Spark MLlib
  • Big data processing – Hadoop/MapReduce, Spark, cluster management, network architecture
  • Data storyteller – communication and presentation skills, data visualization skills and ability to translate machine learning results into a compelling story for enterprise decision makers.
  • Domain knowledge – understanding company mission statement and goals, understanding industry fundamentals, ability to solve business problems, finding new ways to take advantage of enterprise data sets and produce valuable insights.

It should be obvious that the above skills are beyond the ability of a single person. A data science team, on the other hand, would be very well balanced if all the above components were addressed, yet companies continue to seek a unicorn. Here is an actual job ad for the position of “data scientist” appearing in October 2018 (and there are many others that are very similar).

Data Science Teams
Apparently being “more than a ninja rockstar” is now a more politically correct term than unicorn. Nevertheless, reading the ad makes my head spin.  

Whatever the offered salary for the above position, it likely is not that of the combined salary you’ll need when making good data science teams. I think in many cases, companies trying to hire unicorns are just being cheap, thinking if they can hire one person instead of a team, they’ll be ahead of the game. Wrong! Maybe the bit of humor below more adequately represents reality.

Data Science Teams

Daniel Gutierrez, ODSC

Daniel Gutierrez, ODSC

Daniel D. Gutierrez is a practicing data scientist who’s been working with data long before the field came in vogue. As a technology journalist, he enjoys keeping a pulse on this fast-paced industry. Daniel is also an educator having taught data science, machine learning and R classes at the university level. He has authored four computer industry books on database and data science technology, including his most recent title, “Machine Learning and Data Science: An Introduction to Statistical Learning Methods with R.” Daniel holds a BS in Mathematics and Computer Science from UCLA.