Data Science Live Book (open source) ~ new big release! 200-pages Data Science Live Book (open source) ~ new big release! 200-pages
Well after some time, and +300 commits, this is the biggest release of the Data Science Live Book! (open source), after the first... Data Science Live Book (open source) ~ new big release! 200-pages

Well after some time, and +300 commits, this is the biggest release of the Data Science Live Book! (open source), after the first publication more than 1 year ago 🙂

tl;dr: Hi there! I invite you to read the book online and/or download here. Thanks and have a nice day 🙂

!(tl;dr): An overview…

It’s a book to learn data science, machine learning, data analysis with tons of examples and explanations around several topics like:

  • Exploratory data analysis
  • Data preparation
  • Selecting best variables
  • Model performance

Most of the written R code can be used in real scenarios! I worked on the funModeling R package at the same time, soit is used many times in the book.

How about some examples?

It’s a playbook with full of data preparation receipts.

I.e. in the missing values chapter you’ll find how to input and convert these values into something useful for both, analysis and predictive modeling.

Other example, in the outliers chapter you’ll get to know to some methods that spot outliers based on different criteria; funModeling contains a function that can help to process all data at once…

Or more conceptually, we have a numeric variable and we need to convert it into categorical, or vice-versa, do we have to convert or just leave it as it comes?

And so on and so on…

Book’s philosophy

The book has all of its chapters interrelated, so you can start by any of them. My apologies if the number of links distracts from the reading. I wanted it that way just to show how all the machine learning concepts are somehow related.

There is a lot of effort in justifying what the book states. Yet, this is not enough, the reader can replicate and improve the examples, and thus generate their own knowledge.

To develop a critical thinking, without taking any statement as the “truly truth”, it’s really important in this sea of books, courses, videos and any kind of technical material to learn. This book is just another view in the data science perspective.

Hmmm… next releases?

It could vary, but I have some ideas like how to put more information on the predictive model creation and validation, validating clustering models, dimension reduction techniques, “how to become a data scientist”, among others.

Some metrics

An screenshot from google analytics (Oct-25-2017) showing the top 4 most viewed chapters:

I think “profiling” is the most viewed section just because is the first chapter after the index. But the number of entrances -which can be seen as the visitors- was beyond my expectations +18k (visiting around +47k pages).

Many thanks to those who have already read something ❤!

I put some random errors…

… both technical and grammatical, the problem is I don’t know where! So if you want to raise your hand and shout: "That's not correct! I think the correct form is... {replace-with-your-detailed-answer-here}", I invite you to report on the github repository, or email me at pcasas.biz -at- gmail.com

Download the PDF, epub and Kindle version!

If you learn anything new with this book, or it helped you somehow to saving time at your work, you can support the project by acquiring the portable version. (name your price starting at US$ 5)

There is no difference between the portable and web versions 🙂

After the purchase you’ll will receive an email to download it in the three formats.


Download here! 


Original Source.

Pablo Casas

Pablo Casas

I've been in touch with data for the last 10 years, working and playing with data in different areas, either for business or R&D. I'm graduated from Information System Engineering (Universidad Tecnológica Nacional - Argentina). Nowadays I'm working as Machine Learning Specialist in Auth0.com, developing deep learning user behavior models and predictive modeling for marketing and sales. I've passion for teaching all the concepts I learned using gentle examples, helping them not to get bogged down by complex issues. I wrote the Data Science Live Book (DSLB) -open source- which addresses the not-so-popular but highly needed tasks in a data project, such as exploratory data analysis and data preparation for machine learning. Backed by the reader's intuition and logic, the DSLB introduces in a gentle way different concepts and R code receipts ready to be used in real-world problems.