Data Science in the 2010s: How Has the “Data Scientist” Job Evolved?
Featured PostOpinion2020posted by Daniel Gutierrez, ODSC February 10, 2020 Daniel Gutierrez, ODSC
It was the early 2010s and I sensed something was afoot. I was seeing ads for something called Coursera, a new kind of online education company (called a MOOC, or massive open online course) that was offering a free “Machine Learning” class taught by Stanford professor Andrew Ng. The announcement aligned perfectly with my old grad school program where a decade prior I was studying computer science and applied statistics. I had the growing desire to work myself back into the statistical learning arena, so this class seemed like a perfect opportunity. I took the class and was immediately and thoroughly energized. Dr. Ng’s enthusiasm and clear lecturing style, coupled with his inclusion of the mathematical foundations of machine learning, made me want more.
[Free download: What do 50+ Experts Have to Say About AI for 2020?]
What I had sensed going into the course was an enormous pent-up desire for machine learning education. I took the very first class offered by Coursera (I found out I was one of the 10s of thousands of learners), and since then this one course has attracted over 2.7 million people around the world.
Another milestone happened in late 2012, with the now-famous Harvard Business Review article, “Data Scientist: The Sexiest Job of the 21st Century.” I was in heaven. I felt a sense of validation. Not only did I have a shiny new job title, but it was also the sexiest of the century! Previously, when at a dinner party and someone would ask what I did, I would stumble a response like “Oh, I work with computer science, applied statistics, probability theory …” and immediately see a glazed-over look. Throughout the 2010s in contrast, I could tell people I was a “data scientist” and most had at least heard of the profession, and had some sense for what it entailed.
Over the years I strengthened my capabilities and became a consultant in data science, wrote a book on data science and machine learning, became a tech journalist, and returned to UCLA to teach data science classes. As a “data gig worker,” I practice what I call “nomadic data science.” With my laptop loaded with various data science frameworks, and using WiFi to access the cloud, I work with data at places likes Starbucks, libraries, cafés, most anywhere.
In this article, I’m going to reflect back to the 2010s and give my perspective for a number of ways that the data science profession has evolved. If the previous decade is any indication, 2020 and beyond should be a wild ride!
One of the ways that the profession has evolved is in its acceptance. Today’s practitioners may be surprised to learn that years ago, data science was a harder sell than it is today. The industries that were early adopters of data science were limited to Adtech, and Martech. In contrast, today we have many more industries climbing aboard, e.g. Fintech, Insurtech, Proptech (real estate), etc. I even worked for several years in “FashionTech” where I applied data science principles to helping out the fashion industry (I never thought I’d learn so much about women’s jeans!). It didn’t happen overnight, but evolving over the past 10 years I now rarely have to “sell” data science to any enterprise decision-makers. Project stakeholders know they’ll benefit from machine learning. It’s just a matter of approaching the process effectively and efficiently.
Understanding the mathematical foundations of data science and machine learning has never been more important. Without the math, trying to do hyperparameter tuning is just guesswork. Understanding how boosting works, for example, requires you to read sources like “Elements of Statistical Learning,” by Hastie, Tibshirani, and Friedman (see Chapter 10), but that requires a background in Calculus, partial differential equations, and linear algebra. Understanding how Google AI Language’s new BERT algorithm that has turned the NLP field upside down requires the mathematics of deep learning. Although always important, the need for math has evolved steadily in the past 10 years.
When I was writing my machine learning book, my publisher advised that if I included any mathematics it would diminish the audience by 50%. I agreed to leave out the math, but I wish I hadn’t. Now, I recommend various texts for my students to get up to speed with the math background. My friend Gilbert Strang, Professor of Mathematics at MIT, recently published an excellent learning resource, “Linear Algebra and Learning from Data.” This is an important area for how data science is evolving and all data scientists should accept that the math background will help immeasurably.
Throughout the 2010s, the cloud has gained importance to the work of data scientists. The ability to command compute and storage resources on-demand and only pay for what you use is very seductive. Today, I can’t imagine data science being done cost-effectively without the cloud. But many data scientists still haven’t taken the plunge. For those, I recommend Google Colaboratory for Python work, and RStudio Cloud for R work. Recently a student of mine needed to use R in the cloud since all she had access to was an iPad, so RStudio Cloud fit the bill nicely.
Placing a Governor on Your Super Powers
[Related article: 7 Top Data Science Trends in 2020 to Be Excited About]
And finally, one of the biggest ways the role of data scientist has evolved in the past decade is the need to become an arbiter of “data science superpowers.” Although data science and machine learning can be used for some amazingly positive applications, there is the constant pull of data scientists to the dark side, using their highly-technical skills for some very nefarious purposes. Think intrusive facial recognition, deep fakes, many frightening military applications, and you see what I mean.
I remember a few years ago sitting through a presentation at a local Meetup on machine learning in the entertainment field. A presenter from a large, public gaming company, whose title was Data Science Manager, proudly described his collaboration with psychologists to devise technology that would best “addict children” in the use of their game product. I was astonished with his brazen declaration and started to rethink the applications for which I was willing and not willing to apply my skills. I think all data scientists should read Cathy O’Neil’s “Weapons of Math Destruction” for some concrete examples of where the use of machine learning can have harmful outcomes. O’Neil is a prominent data scientist who speaks out about limiting data science superpowers.
There is also a trend among data scientist to delete their Facebook accounts in protest of the company’s misuse of user data and the application of powerful algorithms that invade privacy and mislead individuals.