Let’s Have Some Sympathy For The Part-time R User

Let’s Have Some Sy...

When I started writing about methods for better “parametric programming” interfaces for dplyr for R dplyr users in December of 2016 I encountered three divisions in the audience: dplyr users who had such a need, and wanted such extensions. dplyr users who did not have such a need (“we always know the column names”). dplyr users who found the then-current fairly complex “underscore” and lazyeval system […]

Feature Engineering with Tidyverse

Feature Engineering ...

In this blog post, I will discuss feature engineering using the Tidyverse collection of libraries. Feature engineering is crucial for a variety of reasons, and it requires some care to produce any useful outcome. In this post, I will consider a dataset that contains description of crimes in San Francisco between years 2003-2015. The data can be […]

How Do You Discover R Packages?

How Do You Discover ...

Like I mentioned in my last blog post, I am contributing to a session at userR 2017 this coming July that will focus on discovering and learning about R packages. This is an increasingly important issue for R users as we all decide which of the 10,000+ packages to invest time in understanding and then […]

On indexing operators and composition

On indexing operator...

In this article I will discuss array indexing, operators, and composition in depth. If you work through this article you should end up with a very deep understanding of array indexing and the deep interpretation available when we realize indexing is an instance of function composition (or an example of permutation groups or semigroups: some […]

Scraping CRAN with rvest

Scraping CRAN with r...

I am one of the organizers for a session at userR 2017 this coming July that will focus on discovering and learning about R packages. How do R users find packages that meet their needs? Can we make this process easier? As somebody who is relatively new to the R world compared to many, this […]

Fixing an infelicity in ‘leaps’

Fixing an infelicity...

The ‘leaps’ package for R is ancient – this is its tenth twentieth year on CRAN.  It uses old Fortran code by the Australian computational statistician Alan Miller. The Fortran 90 versions are on the web, but Fortran 90 compilation with R wasn’t portable back then, so I used the older Fortran 77 version. The main point […]

Useful Functions in R

Useful Functions in ...

I have listed some useful functions below: with() The with( ) function applys an expression to a dataset. It is similar to DATA= in SAS. # with(data, expression) # example applying a t-test to a data frame mydata with(mydata, t.test(y ~ group)) Please look at other examples here and here. by() The by( ) function […]

xda: R package for exploratory data analysis

xda: R package for e...

This package contains several tools to perform initial exploratory analysis on any input dataset. It includes custom functions for plotting the data as well as performing different kinds of analyses such as univariate, bivariate and multivariate investigation which is the first step of any predictive modeling pipeline. This package can be used to get a […]

Intro to Caret: Pre-Processing

Intro to Caret: Pre-...

Editor’s note: This is the third of a series of posts on the caret package. Creating Dummy Variables Zero- and Near Zero-Variance Predictors Identifying Correlated Predictors Linear Dependencies The preProcess Function Centering and Scaling Imputation Transforming Predictors Putting It All Together Class Distance Calculations caret includes several functions to pre-process the predictor data. It assumes that […]