In recent years, Data Science evolved into its own profession as a
response to the proliferation of data that needed to be analyzed and
made actionable — a job that could not be adequately addressed by
any single one of its predecessors, largely Computer Science, Quantitative Sciences,
and Consulting. To practice Data Science is to perform tasks along a wide spectrum
ranging from research to application to productization, often simultaneously within the
same project. The project management methodologies of its predecessors — such as
Agile methodology from software engineering or Gantt charts from consulting — are
insufficient for the multifaceted nature of a data science project. However, a hybrid
approach, combining and adapting existing techniques, is much more successful.
In this talk, we describe the approaches in use in our seasoned and growing Data Science
team at Civis Analytics. Specific strategies can minimize wasted time and technical debt for
different types of projects. We offer practical tips regarding how development of scripts and
production-level modules should be organized. We explain how to conduct an exploratory
investigation while advancing general algorithmic development and implementation at the same time.
Finally, we discuss the open source project management and collaboration tools
we have found useful and how to incorporate combinations seamlessly to manage practical
data science workflows.
As a data scientist at Civis Analytics, Elaine conducts research and development of new algorithms for predictive modeling and integrates them into Civis’s data science software platform. Prior to Civis, Elaine was a machine learning software developer at Rifiniti, developing a SaaS platform for corporate real estate optimization, and an analyst for the federal government. She holds a Masters in Statistics from Harvard University and a Bachelors in Mathematics, specializing in Operations Research, from Carnegie Mellon University.