The open-source project, Data Science Live Book, is now available!
Modelingposted by Kaylen Sanders, ODSC June 1, 2018 Kaylen Sanders, ODSC
For those just beginning to embark on a data science career, data scientist Pablo Casas’ Data Science Live Book offers a guided path into the nitty-gritty of the field. Casas presently works as a Machine Learning Specialist at Auth0.com, and his book sheds light on many of the typical obstacles that arise throughout the data analysis and machine learning workflow. Any newcomer to data science is bound to be plagued by questions, and Casas sets out to answer them in a comprehensive manner.
Data Science Live Book chronologically steps through all stages of the analytical process, kicking off with “Exploratory Data Analysis,” then segueing into “Data Preparation,” “Selecting Best Variables,” and, finally, “Assessing Model Performance.” In the words of Casas, “some data sets require more sculpting than others.” Sculpting, per se, is a technique that involves transforming data to tease out its most critical information. Casas’ book might be considered an extended lesson in the art of data sculpting.
Avoiding high-level abstraction, Casas fills the pages of his book with meticulously commented scripts and code as well as plots and charts. The book shows instead of merely telling, and this kind of attention to detail is crucial in such a programming-heavy discipline. Casas makes use of several different R libraries in the book chapters, with funModeling perhaps being the most notable. An attempt at the documentation of funModeling is what largely inspired the entire book, and the library’s functionality is consistently touched upon in the book to help readers engage with various topics. While most of the provided technical examples are coded in R, broader concepts are conveyed independently of any particular programming language. Readers need not worry about being pigeonholed when it comes to their language of choice.
Altogether, Data Science Live Book amounts to a nearly 300-page tour of the standard data analytics pipeline. Even in the face of the book’s breadth, Casas makes no claim that this is the be-all and end-all for budding data scientists. Rather he describes it as “just another step in the learning journey,” intended to serve as a springboard for the immersive winding road of analysis that lies ahead.