How to “Get Good at R”

How to “Get Good a...

Editor’s note: post modified from original How can I get good at R? This has come up enough times for me to outline my thoughts on the subject. That way I can simply forward people to this post the next time the question comes up. My advice is geared towards people who want to build an […]

Installing Jupyter with the PySpark and R kernels for Spark development

Installing Jupyter w...

This is a quick tutorial on installing Jupyter and setting up the PySpark and the R kernel (IRkernel) for Spark development. The pre-reqs for following this tutorial is to have a Hadoop/Spark cluster deployed and the relevant services up and running (e.g. HDFS, YARN, Hive, Spark etc.). In this tutorial I am using IBM’s Hadoop […]

A Budget of Classifier Evaluation Measures

A Budget of Classifi...

Beginning analysts and data scientists often ask: “how does one remember and master the seemingly endless number of classifier metrics?” My concrete advice is: Read Nina Zumel’s excellent series on scoring classifiers. Keep notes. Settle on one or two metrics as you move project to project. We prefer “AUC” early in a project (when you […]

How close am I to an Amazon, Wal-Mart or Target Warehouse?  Visualizing geospatial Information in R.

How close am I to an...

In the last post you saw the “Traveling Santa Problem” which attempts to efficiently route Santa to all US warehouses from Amazon, Wal-Mart and Target.  Now that Santa has made his deliveries I want to know where to go to pick up my gift.  My wish list has a ticket to ODSC West and an […]

Visualizing Census Estimate Margins of Error in R

Visualizing Census E...

A key feature of American Community Survey (ACS) data is that the reported values contain both estimates and margins of error. The margins of error, unfortunately, are often overlooked. After meeting with Ezra Glenn last year I gained a new appreciation of them. Today I’ll demonstrate how to visualize them, as well as how they tend to […]

How to Search for Census Data from R

How to Search for Ce...

In my course Learn to Map Census Data in R I provide people with a handful of interesting demographics to analyze. This is convenient for teaching, but people often want to search for other demographic statistics. To address that, today I will work through an example of starting with a simple demographic question and using […]

Databases In Containers

Databases In Contain...

A great number of readers reacted very positively to Nina Zumel‘s article Using PostgreSQL in R: A quick how-to. Part of the reason is she described an incredibly powerful data science pattern: using a formerly expensive permanent system infrastructure as a simple transient tool. In her case the tools were the data manipulation grammars SQL […]

sample(): The “Monkey’s Paw” Style

sample(): The “Mon...

The R functions base::sample and base::sample.int are functions that include extra “conveniences” that seem to have no purpose beyond encouraging grave errors. In this note we will outline the problem and a suggested work around. Obviously the R developers are highly skilled people with good intent, and likely have no choice in these matters (due […]

Results: R Shapefile Contest

Results: R Shapefile...

I am happy to announce the results from the R Shapefile Contest. The contest was an incredible success – there were 19 entries that covered a range of topics. Each entry was well thought out, and I encourage you to read each of them. Here are the entries, in order of submission: Bonus: Get all […]