fbpx
R
Where is Data Science Heading? Watching R’s Most Popular Packages May Have the Answer
Working as both a journalist and data scientist, I’m in a unique position to report on new tools of the profession as well as use them. I’m always seeking out trends surrounding the arrival of said tools because I feel they speak closely to the evolution of the field.... Read more
Validating Type I and II Errors in A/B Tests in R
In our previous article, we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved. However, when making predictions on individuals, a... Read more
Query Generation in R
R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use. Introduction SQL represents value use by nesting. To use... Read more
Function Objects and Pipelines in R
Composing functions and sequencing operations are core programming concepts. Some notable realizations of sequencing or pipelining operations include: Unix’s |-pipe CMS Pipelines. F#‘s forward pipe operator |>. Haskel’s Data.Function & operator. The R magrittr forward pipe. Scikit-learn‘s sklearn.pipeline.Pipeline. The idea is: many important calculations can be considered as a sequence of transforms applied to a data set.... Read more
Comparing Point-and-Click Front Ends for R
For an updated version of this post, see:http://r4stats.com/articles/software-reviews/r-gui-comparison/. Now that I’ve completed seven detailed reviews of Graphical User Interfaces (GUIs) for R, let’s compare them. It’s easy enough to count their features and plot them, so let’s start there. I’m basing the counts on the number of menu items in each... Read more
Some Details on Running xgboost
While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation). ... Read more
Factors in R
The factor is a foundational data type in R. Factors are generally used to represent categorical variables, which may be intrinsically unordered (nominal) or ordered (ordinal). While the underlying data is often character, factors can be built on numerics as well. Factor variables are stored as integers pointing to unique values of underlying... Read more
Jupyter Notebook: Python or R—Or Both?
I was analytically betwixt and between a few weeks ago. Most of my Jupyter Notebook work is done in either Python or R. Indeed, I like to self-demonstrate the power of each platform by recoding R work in Python and vice-versa. I must have a dozen active notebooks, some... Read more
Validating Type I and II Errors in A/B Tests in R
In the below work, we will intentionally leave out statistics theory and attempt to develop an intuitive sense of what type I(false-positive) and type II(false-negative) errors represent when comparing metrics in A/B tests. One of the problems plaguing the analysis of A/B tests today is known as the “peeking... Read more
Introduction to R Shiny
Alyssa is a speaker for ODSC East 2019 this April 30 to May 3! Attend her talk “Data Visualization with R Shiny.” What is R Shiny? Shiny is an R package that enables you to build interactive web apps using both the statistical power of R and the interactivity... Read more