fbpx
More notebooks for Think Stats
More notebooks for Think Stats As I mentioned in the previous post, I am getting ready to teach Data Science in the spring, so I am going back through Think Stats and updating the Jupyter notebooks.  I am done with Chapters 1 through 6 now. If you are reading the book, you... Read more
The Complexities of Governing Machine Learning
Today’s businesses run on data. It’s essential for any corporation to look for insights about their customers based on the data they collect. That collected information drives everything from business strategy to customer service. In order to retrieve insights from the massive amounts of data they collect, companies are... Read more
Do Resampling Estimates Have Low Correlation to the Truth?
The Answer May Shock You. One criticism that is often leveled against using resampling methods (such as cross-validation) to measure model performance is that there is no correlation between the CV results and the true error rate. Let’s look at this with some simulated data. While this assertion is... Read more
Handwritten digits recognition using Tensorflow with Python
The progress in technology that has happened over the last 10 years is unbelievable. Every corner of the world is using the top most technologies to improve existing products while also conducting immense research into inventing products that make the world the best place to live. Some of these... Read more
You Must Allow Me To Tell You How Ardently I Admire and Love Natural Language Processing
It is a truth universally acknowledged that sentiment analysis is super fun, and Pride and Prejudice is probably my very favorite book in all of literature, so let’s do some Jane Austen natural language processing. Project Gutenberg makes e-texts available for many, many books, including Pride and Prejudice which... Read more
The future of Machine Learning lies in its (human) past
Superficially different in goals and approach, two recent algorithmic advances, Bayesian Program Learning and Galileo, are examples of one of the most interesting and powerful new trends in data analysis. It also happens to be the oldest one. Bayesian Program Learning (BPL) is deservedly one of the most discussed... Read more
Thomas originally posted this article here at http://twiecki.github.io  Hierarchical models are underappreciated. Hierarchies exist in many data sets and modeling them appropriately adds a boat load of statistical power (the common metric of statistical power). I provided an introduction to hierarchical models in a previous blog post: Best Of Both Worlds:... Read more
Choroplethr v3.6.0 is now on CRAN
Choroplethr version 3.6.0 is now on CRAN. This version adds functionality for getting and mapping demographics of US Census Tracts. You can install it from the R console as follows: 1 2 3 install.packages("choroplethr") packageVersion("choroplethr") ‘3.6.0’ To use this functionality you will need an API key from the... Read more
Hello all and welcome to the second of the series – NLP with NLTK. The first of the series can be found here, incase you have missed. In this article we will talk about basic NLP concepts and use NLTK to implement the concepts. Contents: Corpus Tokenization/Segmentation Frequency Distribution... Read more
An overview of gradient descent optimization algorithms
Note: If you are looking for a review paper, this blog post is also available as an article on arXiv. Table of contents: Gradient descent variants Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Challenges Gradient descent optimization algorithms Momentum Nesterov accelerated gradient Adagrad Adadelta RMSprop Adam Visualization... Read more