TPOT: The Data Scientist’s Little Helper

Tags: , ,

One of the most involved steps in the data science process is the iteration of model building. Round after round of analyzing features, trying different algorithms, and tuning parameters is the established paradigm.

 What if you could automate all of that, though? A new tool from Dr. Randal Olson, a post-doctoral researcher at the University of Pennsylvania, looks to answer this question.

 The Tree-based Pipeline Optimization Tool, TPOT for short, uses genetic algorithms to iteratively build various machine learning pipelines before settling on the one that works the best. (“Best” in this case usually refers to predictive power.) TPOT even provides the code that produces the optimal model by leveraging the scikit-learn API upon which it’s built.

Check out the introductory blog post here and the GitHub repository here.