R
How Tidyverse Guides R Programmers Through Data Science Workflows
Whenever someone asks me how to get into data science using R, I invariably recommend checking out the tidyverse package. Tidyverse is a great launch pad for a language like R because it offers order and consistency. I studied programming language design as a CS undergrad. At the time,... Read more
Sentiment Analysis in R Made Simple
Sentiment analysis is located at the heart of natural language processing, text mining/analytics, and computational linguistics. It refers to any measurement technique by which subjective information is extracted from textual documents. In other words, it extracts the polarity of the expressed sentiment in a range spanning from positive to... Read more
Feature Engineering with Forward and Backward Elimination
Sometimes when you fit models to test their predictive accuracy, you find that you’re dealing with too many predictors (feature variables). You can draw upon your domain knowledge, or that of an available domain expert, to reduce predictors until you only have those that will offer your model superior... Read more
Snakes in a Package: Combining Python and R with Reticulate
When I first started working as a data scientist (or something like it) I was told to program in C++ and Java. Then R came along and it was liberating; my ability to do data analysis increased substantially. As my applications grew in size and complexity, I started to... Read more
SQL Equivalents in R
Whenever I’m teaching introductory courses in data science using the R language, I often encounter students who use a different language like Python or Julia, and still others who are transitioning into data science from other fields and don’t know any data science language at all. The common thread... Read more
Monthly Summary of Selected Trends, Activities, and Insights for R – July 2018
R is a leading language in the data science domain. In the following article, a summary of selected trends, activities, and insights around the R language from July 2018 are presented. Data for the trends and activities summarized here were obtained from popular websites used by the R community such... Read more
The Tidyverse Curse
I’ve just finished a major overhaul to my widely read article, Why R is Hard to Learn. It describes the main complaints I’ve heard from the participants to my workshops, and how those complaints can often be mitigated. Here’s the only new section: The Tidyverse Curse There’s a common theme in many... Read more
rquery: Fast Data Manipulation in R
Win-Vector LLC recently announced the rquery R package, an operator based query generator. In this note I want to share some exciting and favorable initial rquery benchmark timings. Note we have now (1-16-2018) re-run this benchmark with a faster, better tuned, version of the data.table solution (same package, just better use of it). Let’s take a look at... Read more
Group-By Modeling in R Made Easy
There are several aspects of the R language that make it hard to learn, and repeating a model for groups in a data set used to be one of them. Here I briefly describe R’s built-in approach, show a much easier one, then refer you to a new approach described... Read more
Seeking Guidance in Choosing and Evaluating R Packages
At useR!2017 in Brussels last month, I contributed to an organized sessionfocused on navigating the 11,000+ packages on CRAN. My collaborators on this session and I recently put together an overall summary of the session and our goals, and now I’d like to talk more about the specific issue of learning... Read more