fbpx
R
What is “Tidy Data”?
I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy... Read more
135 Nights of Sleep with Data, Anomaly Detection, and Time Series
In this article, I look at data from 135 nights of sleep and use anomaly detection and time series data to understand the results. Three things are certain in life: death, taxes, and sleeping. Here, we’ll talk about the latest. Every night*, us humans, after a... Read more
Using an Embedding Matrix on Tabular Data in R
How would you tackle the prospects of representing a categorical feature, with 100’s of levels, in a model? A first approach may be to create a one-hot encoded matrix representing each level of the feature. The result would be a large and sparse matrix where the... Read more
ODSC West 2019 Talks and Workshops to Expand and Apply R Skills
At this point, most of us know the basics of using and deploying R—maybe you took a class on it, maybe you participated in a hackathon. That’s all important (and we have tracks for getting started with Python if you’re not there yet), but once you... Read more
R-Related Talks Coming to ODSC West 2019
R is one of the most commonly-used languages within data science, and its applications are always expanding. From the traditional use of data or predictive analysis, all the way to machine or deep learning, the uses of R will continue to grow and we’ll have to... Read more
R User Community Worldwide: A Data-Driven Exploration
This posting discusses the worldwide R user community. R is a programming language and environment for statistical computing and data visualization. An important component of the R ecosystem is its powerful user community, which has continued to expand around the world over the years. In a... Read more
Timing the Same Algorithm in R, Python, and C++
While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read... Read more
Where is Data Science Heading? Watching R’s Most Popular Packages May Have the Answer
Working as both a journalist and data scientist, I’m in a unique position to report on new tools of the profession as well as use them. I’m always seeking out trends surrounding the arrival of said tools because I feel they speak closely to the evolution... Read more
Validating Type I and II Errors in A/B Tests in R
In our previous article, we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved. However, when making predictions... Read more
Query Generation in R
R users have been enjoying the benefits of SQL query generators for quite some time, most notably using the dbplyr package. I would like to talk about some features of our own rquery query generator, concentrating on derived result re-use. Introduction SQL represents value use by... Read more