The following Q&A is part of a series of interviews conducted with speakers at the 2017 ODSC East conference in Boston. The interview has been condensed and edited for clarity. This interview is with Barton Poulson, Founder at Datalab, whose talk was entitled “Data Science for the 99%”.
What does Data Science for the 99% mean?
Most of the time when people talk about Data Science, they’re talking about enterprise-level Data Science like Amazon or Uber. But these really large companies in terms of an organizational headcount are really only about one percent of the entire employers in the country. They do account for the majority of full-time data scientists. But my argument is that most of the companies in the entire country are small to medium business, a huge percentage employ ten or fewer people. Those are the organizations that I tend to work with. I do events where I organize data hackathons, where I get data professionals to help small non-profits who don’t have any analytical staff at all, let alone data scientists. My argument is that what we’re able to do is still useful for them.
On the other hand, we’re not firing up AWS and we’re not using neural networks, it means doing bar and line charts in excel because that’s what the organizations need. But you can still use a lot of the data science skillset even with these examples because it involves clean and bringing in data. But the analysis tends to be much more fundamental to meet these organizations’ needs. So my point is that’s where awful lot of the needs are even for data scientists.
What’s the reception you’ve received been like?
I’ve had people come expecting a higher level high data science and I’ve had to say “Cool your jets, we’re dealing with local needs”. And it’s been a challenge for some people because the emphasis is no longer on technical skills but on the implementation and interpretation of data which doesn’t always receive as much emphasis.
The term 99% implies that your mission is a political one, do you agree?
No, but I think that democratization is the right word.
You’re based in Utah, can you tell us what the Data Science community is like over there.
Utah of course is a small area, but it has thriving tech scene. In terms of trained data scientists and technologists per capita, Utah is one of the highest in the country. While we’re dealing with a relatively small number of people overall, there’s a wonderful talent pool to pull from.
Is it a challenge to keep and attract talent?
For the state of Utah in general is a major challenge, but when people actually come, it’s an easier sell.
Does the Utah tech scene have a special name a la “Silicon Valley” or “Silicon Beach”?
It’s Silicon Slopes.
Moving on, can you tell me more about Datalab and some of the projects you’re working on.
Datalab is my own project. I’ve recently defined Datalab as my full-time job, the fact is that I’m employed full-time as a psychology professor at Utah Valley University.
I’ve been working several years as a contract author of Lynda.com which is now Linkedin Learning. I prepared video course on data-related topics for them, which includes conceptional courses that explain what Big Data is or what Data Science is.
Datalab is my attempt to do what I do for Linkedin Learning and do it for my own company. Fortunately Linkedin is very accommodating, they know about this project and support it even though it in certain ways it makes a direct competitor of theirs.
It’s designed to help small organizations develop some of the skills they need in order to work with data on their own. Right now, Datalab has mostly this is what Data Science is and the elements of it.
I’m specifically focusing on performing arts organizations and helping them to make data-driven decisions in their and staffing, marketing, and programming. These groups I work with are very small, they have five or fewer employees and budgets less than $1million. I’m help these organizations, who are strapped for cash, used their resources efficiently.
I’d like to hear more about one of these organizations and some of the projects they’re working on.
One organization is Spyhop. Located in Utah, they do media classes for youth (ages 10-20). They do film, video production, and music production and video game programming as well. They’re the biggest organization that I work with, employing 12-15 people. They use salesforce and have the data, but they’ve been able to do any analysis.
What we did for them is we took two data sources, data on all the kids who’ve been enrolled in their programs in the last ten years. They gave us data on who was enrolled and what classes they took and data on the donors to the organization. What they didn’t have was a connection from their donor database to their student database.
As an educator, what are the challenges of teaching a subject that is ever-changing and evolving?
People think of data science as just the technical skills. The computer part of data science receives about a 98% emphasis in most of the training. But it really is one aspect. My training is a little different, I approach from the statistics side because of my years in teaching statistics.
I spend a lot of time saying the computer stuff is important but you really can’t develop that at the neglect of domain expertise. This means can you do this in the real world with real data, especially with a small organization.
The technical skills are important but we’ve got to bring you back to earth to accomplish something specific for specific people and you need to focus on that.
What do students need to when it comes domain expertise.
Knowing the organization’s goal, what are they trying to do, and knowing their resources and constraints. It’s really easy to go in there and say I’m gonna do this machine learning algorithm and that’s really not what they’re looking for. They don’t know what it means, they’ll never be able to implement it themselves, and that may not be answering the question.
One organization that we worked for, Springfield Museum of Art. Their question was really simple: should we continue to be open on Sundays? We said no and that’s all they needed. Part of the problem is when people do analysis, they get fired up when they give all the technical details. People want to give a 50-slide presentation, when all they need is one that says yes or no.
How do you view the state of the discourse surrounding the data science, AI, deep learning, etc?
Well anytime there’s a new development in anything, there’s going to be a lot of hype and deep learning is definitely very high on the hype cycle. It’s a useful tool, but it’s not going to solve everything and it doesn’t apply to everything. You have to have large datasets to make it work. The organizations I work with, they at most have a few thousand rows of data.
These are still just tools, but the fact is that the vast majority of decisions are still made by humans. This is where I have an ongoing debate about the use of blackbox models and deep learning is absolutely a black box model. It doesn’t help make decisions, it will make the decision for you if you feed it enough data and train it right but there’s a limited number of applications of machine making a decision on your behalf in a business setting.
Organizations I work with want rules of thumb, they want to know in general what should we be doing and looking at. I just don’t think that deep learning and AI answer that question for them, when things like simple linear regression or a single decision tree come into it. The technical skills are very important, the developments are exciting, but I still think they apply to edge cases.