fbpx
You Must Allow Me To Tell You How Ardently I Admire and Love Natural Language Processing
It is a truth universally acknowledged that sentiment analysis is super fun, and Pride and Prejudice is probably my very favorite book in all of literature, so let’s do some Jane Austen natural language processing. Project Gutenberg makes e-texts available for many, many books, including Pride and Prejudice which... Read more
The future of Machine Learning lies in its (human) past
Superficially different in goals and approach, two recent algorithmic advances, Bayesian Program Learning and Galileo, are examples of one of the most interesting and powerful new trends in data analysis. It also happens to be the oldest one. Bayesian Program Learning (BPL) is deservedly one of the most discussed... Read more
Thomas originally posted this article here at http://twiecki.github.io  Hierarchical models are underappreciated. Hierarchies exist in many data sets and modeling them appropriately adds a boat load of statistical power (the common metric of statistical power). I provided an introduction to hierarchical models in a previous blog post: Best Of Both Worlds:... Read more
Choroplethr v3.6.0 is now on CRAN
Choroplethr version 3.6.0 is now on CRAN. This version adds functionality for getting and mapping demographics of US Census Tracts. You can install it from the R console as follows: 1 2 3 install.packages("choroplethr") packageVersion("choroplethr") ‘3.6.0’ To use this functionality you will need an API key from the... Read more
Hello all and welcome to the second of the series – NLP with NLTK. The first of the series can be found here, incase you have missed. In this article we will talk about basic NLP concepts and use NLTK to implement the concepts. Contents: Corpus Tokenization/Segmentation Frequency Distribution... Read more
An overview of gradient descent optimization algorithms
Note: If you are looking for a review paper, this blog post is also available as an article on arXiv. Table of contents: Gradient descent variants Batch gradient descent Stochastic gradient descent Mini-batch gradient descent Challenges Gradient descent optimization algorithms Momentum Nesterov accelerated gradient Adagrad Adadelta RMSprop Adam Visualization... Read more
Random-Walk Bayesian Deep Networks: Dealing with Non-Stationary Data
Thomas originally posted this article here at http://twiecki.github.io  Most problems solved by Deep Learning are stationary. A cat is always a cat. The rules of Go have remained stable for 2,500 years, and will likely stay that way. However, what if the world around you is changing? This is common, for... Read more
NYC Pre-K Explorer
Shiny Project contributed by Amy Tzu-Yu Chen – Data Science Student in the NYC Data Science Academy Bootcamp Motivation In 2013, Mayor De Blasio campaigned on a promise of universal pre-kindergarten. The program makes access to free pre-kindergarten education available to all NYC families, regardless of child’s abilities and family income. Now,... Read more
Dealing with arrays which are bigger than memory – an introduction to biggus
I often deal with huge gridded datasets which either stretch or indeed are beyond the limits of my computer’s memory. In the past I’ve implemented a couple of workarounds to help me handle this data to extract meaningful analyses from them. One of the most intuitive ways of reducing... Read more
Exploring the Relationship between Religion and Demographics in R
Today’s guest post is by Julia Silge. Take a look at her work on (“Mapping US Religion Adherence by County in R“) where she demonstrated how to work with US religion adherence data in R. In this post she explores the relationship between that dataset and US Demographic data. I... Read more