fbpx
R
Seeking Guidance in Choosing and Evaluating R Packages
At useR!2017 in Brussels last month, I contributed to an organized sessionfocused on navigating the 11,000+ packages on CRAN. My collaborators on this session and I recently put together an overall summary of the session and our goals, and now I’d like to talk more about the specific issue of learning... Read more
matmul() is eating software
Last week Zak Stone from Google Brain gave a talk at South Park Commons where he wove together a bunch of threads that are shaping future machine learning progress: TensorFlow, XLA, Cloud TPUs, TFX, and TensorFlow Lite; he also hinted at even more exciting stuff not quite ready for public consumption. (Fun... Read more
Civic Data Wrangling: in R and on data.world
One of the most valuable things I have learned working on Data for Democracy’s Medicare drug spending project has been the value of collaborative tools. It has been my first in-depth experience using Github collaboratively, for one, but it has also introduced me to data.world. data.world is an intuitive way... Read more
Tutorial: Using seplyr to Program Over dplyr
seplyr is an R package that makes it easy to program over dplyr0.7.*. To illustrate this we will work an example. Suppose you had worked out a dplyr pipeline that performed an analysis you were interested in. For an example we could take something similar to one of the examples from the dplyr 0.7.0 announcement. suppressPackageStartupMessages(library("dplyr")) packageVersion("dplyr") ##... Read more
Let’s Have Some Sympathy For The Part-time R User
When I started writing about methods for better “parametric programming” interfaces for dplyr for R dplyr users in December of 2016 I encountered three divisions in the audience: dplyr users who had such a need, and wanted such extensions. dplyr users who did not have such a need (“we always know the column names”). dplyr users who found... Read more
WHO Tuberculosis Data & ggplot2
So it has been a while since my previous post on some data taken from the UNHCR database. This post we’ll bring it back to the topic of infectious diseases (check out my other posts on the SIR model and MRSA typing). For this post, as similar to previous ones, I give a guide through... Read more
Feature Engineering with Tidyverse
In this blog post, I will discuss feature engineering using the Tidyverse collection of libraries. Feature engineering is crucial for a variety of reasons, and it requires some care to produce any useful outcome. In this post, I will consider a dataset that contains description of crimes in San Francisco between... Read more
Using GRAKN.AI to reason over an R dataset
Introduction In this article I will introduce an open-source knowledge graph platform called GRAKN.AI. I’m going to use it to load a simple dataset, and show how to calculate basic statistics such as maximum and mean values. A good question at this point would be: as a data scientist,... Read more
How Do You Discover R Packages?
Like I mentioned in my last blog post, I am contributing to a session at userR 2017 this coming July that will focus on discovering and learning about R packages. This is an increasingly important issue for R users as we all decide which of the 10,000+ packages to... Read more
On indexing operators and composition
In this article I will discuss array indexing, operators, and composition in depth. If you work through this article you should end up with a very deep understanding of array indexing and the deep interpretation available when we realize indexing is an instance of function composition (or an example... Read more