Last week I had the opportunity to attend Open Data Science Conference in Boston. It was awesome to see people just walking around who I had previously read about or I’m following them on twitter. It was even nicer to meet some of these people, and I was amazed at how friendly everyone was.
Of course you can’t attend everything at a conference like this, at one point there was 11 different sessions going on at once. It was really difficult to determine which sessions to attend given the number of great options, but I tried to align the information I’d be consuming closely with what I’d be able to bring back to my day job and implement.
In this article I’ll cover some learnings/ favorite moments from:
- one of the trainings
- a couple different workshops
- the sweet conference swag
- mention one of the keynotes
My original plan was to take an R training in the morning on Tuesday and take a Python training that afternoon. However, what really happened was I went to the R training in the morning, this training left me feeling super jazzed about R, and so I ended up going to another R training that afternoon (instead of the Python training I had originally planned on).
The morning R training I took was “Getting to grips with the tidyverse (R)” given byDr. Colin Gillespie. This was perfect, because I had been struggling with dplyr (an R package) the night previously, and this training went through parts of dplyr with great explanations along the way. Colin also showed us how to create plots using the package “Plotly”. This was my first time creating an interactive graph in R. Easy to use, and super cool. He was also nice enough to take a look at the code I was currently working on, I definitely appreciated this.
The afternoon R training I attended was given by Jared Lander entitled “Intermediate RMarkdown in Shiny”. It was my first introduction to Shiny. I had heard about it, but had never ventured to use it, now I don’t know what I was waiting for. If you ever have the opportunity to hear Jared speak, I found him incredibly entertaining, and he explained the material clearly, making it super accessible. I like to think Jared also enjoyed my overly animated crowd participation.
On Thursday I attended “Uplift Modeling and Uplift Prescriptive Analytics: Introduction and Advanced Topics” by Victor Lo, PHD. This information really resonated with me. Dr. Lo spoke about the common scenario in Data Science where you’ll build a model to try and predict something like customer attrition. You’d maybe take the bottom three deciles (the people with the highest probability of cancelling their subscription, and do an A/B test with some treatment to try and encourage those customers to stay.
In the end, during analysis, you’d find that you did not have a statistically significant lift in test over control with the usual methods. You end up in a situation where the marketers would be saying “hey, this model doesn’t work” and the data scientist would be saying “what? It’s a highly predictive model”. It’s just that this is not the way that you should be going about trying to determine the uplift. Dr. Lo spoke about 3 different methods and showed their results.
- Two Model Approach
- Treatment Dummy Approach
- Four Quadrant Method
Here is the link to his ODSC slides from 2015 where he also covered these 3 models (with similar slides): here
I’ve experienced this scenario before myself, where the marketing team will ask for a model and want to approach testing this way. I’m super excited to use these methods to determine uplift in the near future.
Another workshop I attended was “R Packages as Collaboration Tools” byStephanie Kirmer (slides). Stephanie spoke about creating R packages as a way to automate repeated tasks. She also showed us how incredibly easy it is to take your code and make it an R package for internal use. Here is another case that is applicable currently at my work. I don’t have reports or anything that is due on a regular cadence, but we could certainly automate part of the test analysis process, and there are currently ongoing requests asked of Analytics in our organization that could be automated. Test analysis is done in a different department, but if automated, this would save time on analysis, reduce potential for human error in test analysis, and free up bandwidth for more high value work.
Although conference swag probably doesn’t really need a place in this article, Figure Eightgave out a really cool little vacuum that said “CLEAN YOUR DATA”. I thought I’d share a picture with you. Also, my daughter loved the DataRobot stickers and little wooden robots they gave out. She fashioned the sticker around her wrist and wore it as a bracelet. 3 year olds love conference swag:
The vacuum that was given out by Figure Eight. I assume you’re supposed to clean your keyboard with it?
Susie, her blue wooden robot, and her bracelet/sticker
The keynote was Thursday morning. I LOVED the talk given by Cathy O’Neil, a link to her TED talk is here. She spoke about the importance of ethics in data science, and how algorithms have to use historical data, therefore, they’re going perpetuate our current social biases. I love a woman who is direct, cares about ethics, and has some hustle. Go get em’ girl. I made sure to get a chance to tell her how awesome her keynote was afterwards. And of course I went home and bought her book “Weapons of Math Destruction”. I fully support awesome.
I had an incredible time at the ODSC conference. Everyone was so friendly, my questions were met with patience, and it was clear that many attendees and speakers had a true desire to help others learn. I could feel the sense of community. I highly suggest that if you every get the opportunity to attend, go! I am returning to work with a ton of new information that I can begin using immediately at my current job, it was a valuable experience. I hope to see you there next year.