IOTA – The Potential to Drive Data Science for IoT
I have a close circle of clued-on/tech savvy friends whose views I take seriously. For the last few weeks, one of these friends has been sending me emails extolling the merits of something called IOTA – which calls itself as the next generation Blockchain.  At first, I thought of IOTA... Read more
Are data warehouses a thing of the past?
With almost everything around us becoming a source of data, it’s proving to be quite a challenge for traditional data warehouses to support such fast changing and high on volume data. So is data warehouse a thing of the past already? A huge collection of data from various sources... Read more
How To Create Data Products That Are Magical Using Sequence-to-Sequence Models
A tutorial on how to summarize text and generate features from Github Issues using deep learning with Keras and TensorFlow. Teaser: Training a model to summarize Github Issues Predictions are in rectangular boxes. The above results are randomly selected elements of a holdout set. Keep reading below, there will be a link to many more... Read more
Word Vectors with Tidy Data Principles
Last week I saw Chris Moody’s post on the Stitch Fix blog about calculating word vectors from a corpus of text using word counts and matrix factorization, and I was so excited! This blog post illustrates how to implement that approach to find word vector representations in R using tidy data... Read more
This is the first post of a series of three articles in which we will discuss tips and guidelines for successful data science implementations. This post goes over the things you should worry about before to write the first line of code. A high level data science project will... Read more
How Do You Discover R Packages?
Like I mentioned in my last blog post, I am contributing to a session at userR 2017 this coming July that will focus on discovering and learning about R packages. This is an increasingly important issue for R users as we all decide which of the 10,000+ packages to... Read more
Standard software development practices for web, Saas, and industrial environments tend to focus on maintainability, code quality, robustness, and performance. Scientific programing in data science is more concerned with exploration, experimentation, making demos, collaborating, and sharing results. It is this very need for experiments, explorations, and collaborations that is... Read more
Lynn Root at ODSC Boston 2015
Metric-Driven Development: See the Forest for the Trees At Spotify, my team struggled to be awesome. We had a very loose understanding of what product/service our squad was responsible for, and even less so of the expectations our internal and external customers had for those services. Other than “does... Read more
Agile Data – Chris Bergh ODSC Boston 2015
Agile Data from odsc To rephrase an old saying: ‘It takes a village to raise an Analyst.’ Data Analysts and Scientists are working in teams delivering insight and analysis on an ongoing basis. So how do you get the team to support experimentation and insight delivery without ending up... Read more
Data Workflows for Iteration, Collaboration, and Reproducibility – David Chudzicki ODSC Boston 2015
http://www.davidchudzicki.com/slides/odsc-2015-workflow/ For other data scientists to improve, build on, or even just trust your analysis, they need to be able to reproduce it. Even if you have shared code and data, reproducing your analysis may be difficult: which code was executed against which data in what order? And even... Read more
Open Data Science - Your News Source for AI, Machine Learning & more