Using Keras and TensorFlow in R
Keras and Tensorflow are two very powerful packages that are normally accessed via python. Since the packages were developed for python they may have the illusion of being out of reach for R users. However, this is not the case as the Keras and Tensorflow packages... Read more
What is “Tidy Data”?
I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy... Read more
R User Community Worldwide: A Data-Driven Exploration
This posting discusses the worldwide R user community. R is a programming language and environment for statistical computing and data visualization. An important component of the R ecosystem is its powerful user community, which has continued to expand around the world over the years. In a... Read more
Timing the Same Algorithm in R, Python, and C++
While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read... Read more
Where is Data Science Heading? Watching R’s Most Popular Packages May Have the Answer
Working as both a journalist and data scientist, I’m in a unique position to report on new tools of the profession as well as use them. I’m always seeking out trends surrounding the arrival of said tools because I feel they speak closely to the evolution... Read more
Validating Type I and II Errors in A/B Tests in R
In our previous article, we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved. However, when making predictions... Read more
Query Generation in R
R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use. Introduction SQL represents value use by... Read more
Function Objects and Pipelines in R
Composing functions and sequencing operations are core programming concepts. Some notable realizations of sequencing or pipelining operations include: Unix’s |-pipe CMS Pipelines. F#‘s forward pipe operator |>. Haskel’s Data.Function & operator. The R magrittr forward pipe. Scikit-learn‘s sklearn.pipeline.Pipeline. The idea is: many important calculations can be considered as a sequence of transforms applied to... Read more
Comparing Point-and-Click Front Ends for R
For an updated version of this post, see:http://r4stats.com/articles/software-reviews/r-gui-comparison/. Now that I’ve completed seven detailed reviews of Graphical User Interfaces (GUIs) for R, let’s compare them. It’s easy enough to count their features and plot them, so let’s start there. I’m basing the counts on the number of menu... Read more
Some Details on Running xgboost
While reading Dr. Nina Zumel’s excellent note on bias in common ensemble methods, I ran the examples to see the effects she described (and I think it is very important that she is establishing the issue, prior to discussing mitigation). [Related Article: When Less is More: A Brief Story About Feature... Read more