There’s no right way to build a career in data science, as there are countless paths to get to countless roles. We recently had the chance to speak with Emily Robinson, senior data scientist, consultant, international speaker, and co-author of “Build a Career in Data Science,” which covers all the non-technical skills and knowledge you need to get into and succeed in data science. During this Lightning Interview, Sheamus McGovern spoke with Emily about the knowledge and skills you need to build your data science career.
You can listen to the full Lightning Interview here, and read the transcript for the first few questions with Emily Robinson below.
What are some of your preferred methods for learning new data science skills?
It really depends on your background. The boot camp was a good fit for me because I had a statistics background, and I’ve been programming R in undergrad and did some in grad school. But I hadn’t really used Python much. I was interested in learning that, but I hadn’t used GitHub, I didn’t have a portfolio of any of the projects I had done, and I hadn’t really done any machine learning either.
A boot camp is only 12 weeks, and they do have an admissions process which I think is good because I don’t think you’d succeed if you go in having no background in math, no background in programming, or both. I think it works best for people who have some other skills but want to spend some time making a portfolio and filling in some of the gaps that they don’t have right now, which is important in a data science or tech career. You’ll always be learning more though, like learning on the job. When you pay for a bootcamp it’s a lot less than a master’s, and it may be hard to live long without a salary but a bootcamp is short enough and you can hopefully get a stipend as well.
One of my favorite ways is learning is at the job if you can. What I mean by that is, let’s say you’re an engineer or you’re in marketing at a company, are there ways that you can use data more in your job than you have before? I know some engineers who find themselves using data more by looking at outages, page load time, graphing that, whether are there any predictable ways for learning more about what they’re working with, etc.
I think that that can offer a good way because then you have the benefit of your domain knowledge as well. Let’s say you’re working in marketing and you’re learning more about how to use data for your marketing campaigns, but eventually, you want to become a data scientist. One of the best ways they can start is as a marketing data scientist, because you already know a lot of the skills you need to know, like what do campaigns mean, what are things that are important to the marketing director, and what are the different measurement tools? You can build up those skills to interpret the data, or maybe gather data you couldn’t before, but you’re starting from a solid foundation so that’s often my recommended way to do it.
Why is a portfolio so important?
I certainly don’t think a portfolio is required. I know plenty of successful data scientists who do not have any portfolio, but you could get projects on GitHub, it could be blog posts, it could potentially be talks, and so on. It can certainly be very helpful. I think it’s just more evidence that you can do the job, and that’s something that can be very hard to show if you haven’t done the job before.
I’ll give an example. One thing I just did kind of for fun was on this series I really like on Refinery29 called money Diaries, which is anonymous and where people submit their weekly spending. So for one week they kind of track their spending, they also submit their age, their salary, their housing costs, and their monthly costs, as your little diary. It’s a peek into people’s lives but there’s no search function. So if you’re saying “I want to look at people who live in New York City” or “I want to look at people who make less than $50,000,” there’s no way to search for that on Refinery29.
For that, what I did was I used some webscraping and I went with R, I took all of the Money Diaries, I got their titles, I did some irregular expressions to get information like the age of the people, what they pay for their monthly housing costs, etc. And then I made a little web app for that which has one row per diary and you can search and sort it, then it allows you to click the link to get the full diary.
That’s the kind of thing I recommend because what it shows a company is that I can gather the data myself, I can parse the data, and I can see some of the headaches like “why is the data structured like this?” So I’ll write everything I found, and it’s not necessarily about Money Diary, but it might be about an article about something else, and to show what’s the most useful way to display the information. At the end of the day, it’s all about making something that’s useful to other people, so for the staff and not just to me. You can publish it for free thanks to RPubs and anyone could go on and use the app.
It also shows the company that I shared my code on GitHub, so they’re like “Oh hey, this person can do these things that I want and they’re not just going on Kaggle and predicting who dies in the Titanic with the clean dataset. They’re coming up with an idea themselves and seeing that to the end.” It can be a very useful thing to show somebody both the specific skills involved, whether that’s web scraping or machine learning or whatever, and also that you can do this end-to-end project.
You are going to need more than technical knowledge to succeed as a data scientist. “The Build a Career in Data Science” book whose co-author is Emily Robinson teaches you what school leaves out: from how to land your first job, to the lifecycle of a data science project, and even how to become a manager. Emily regularly gives talks across the country on A/B testing, programming in R, and data science careers.