fbpx
R
15+ Resources to Get Started with R
R is the second most sought after language in data science behind Python, so gaining mastery of R is a prerequisite to a thriving career in the field. Whether you’re an experienced developer or a newbie considering a career move, here are some excellent resources so you can get... Read more
Quantifying R Package Dependency Risk
We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practice (or at least correlates with other good practices)? [Related Article: Data-Driven Exploration of the R User Community... Read more
Machine Learning 101: Predicting Drug Use Using Logistic Regression In R
Executive Summary Generalized Linear Models (GLM) Three types of link function: Logit, Probit, and Complementary log-log (cloglog) Building a logistic regression to predict drug use and compare these three types of GLM In Machine Learning 101 courses, stats professors introduce GLM right after linear regression as the next stepping... Read more
Image Compression In 10 Lines of R Code
Principal Component Analysis (PCA) is a powerful Machine Learning tool. As an unsupervised learning technique, it excels in dimension reduction and feature extraction However, do you know we can use PCA to compress images? In this post, I’ll walk through the process and explain how PCA can compress images... Read more
A Quick Look Into Bootstrapping
Executive Summary As a resampling method, bootstrapping allows us to generate statistical inferences about the population from a single sample. Learn to bootstrap in R. Bootstrapping lies the foundation for several machine learning methods (e.g., Bagging. I’ll explain Bagging in a follow-up post). [Related Article: Discovering 135 Nights of... Read more
Balancing Interpretability and Predictive Power with Cubist Models in R
Machine learning models are powerful tools that do well in their purpose of prediction. In many business applications, the power of these models is quite beneficial. With any application of a machine learning model, the process to choosing which model involves determining the model that performs best across a... Read more
Using Keras and TensorFlow in R
Keras and Tensorflow are two very powerful packages that are normally accessed via python. Since the packages were developed for python they may have the illusion of being out of reach for R users. However, this is not the case as the Keras and Tensorflow packages may be set... Read more
What is “Tidy Data”?
I would like to write a bit on the meaning and history of the phrase “tidy data.” Hadley Wickham has been promoting the term “tidy data.” For example in an eponymous paper, he wrote: In tidy data: Each variable... Read more
Discovering 135 Nights of Sleep with Data, Anomaly Detection, and Time Series
In this article, I look at data from 135 nights of sleep and use anomaly detection and time series data to understand the results. Three things are certain in life: death, taxes, and sleeping. Here, we’ll talk about the latest. Every night*, us humans, after a long day of... Read more
Using an Embedding Matrix on Tabular Data in R
How would you tackle the prospects of representing a categorical feature, with 100’s of levels, in a model? A first approach may be to create a one-hot encoded matrix representing each level of the feature. The result would be a large and sparse matrix where the majority of the... Read more