fbpx
R
Guide to R and Python in a Single Jupyter Notebook
Why pick one when you can use both at the same time? R is primarily used for statistical analysis, while Python provides a more general approach to data science. R and Python are object-oriented towards data science for programming language. Learning both is an ideal solution. Python is a... Read more
PI and Simulation Art in R
I spent the better part of an afternoon last week perusing a set of old flash drives I’d made years ago for my monthly notebook backups. One that especially caught my attention had a folder of R scripts, probably at least 15 years old — harking back to my... Read more
Beginner’s Guide to K-Nearest Neighbors in R: from Zero to Hero
In the world of Machine Learning, I find the K-Nearest Neighbors (KNN) classifier makes the most intuitive sense and easily accessible to beginners even without introducing any math notations. To decide the label of an observation, we look at its neighbors and assign the neighbors’ label to the observation... Read more
An Efficient Way to Install and Load R Packages
Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. These extensions which are are collections of functions and datasets developed and published by R users are called packages. They extend existing base R... Read more
15+ Resources to Get Started with R
R is the second most sought after language in data science behind Python, so gaining mastery of R is a prerequisite to a thriving career in the field. Whether you’re an experienced developer or a newbie considering a career move, here are some excellent resources so you can get... Read more
Quantifying R Package Dependency Risk
We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practice (or at least correlates with other good practices)? [Related Article: Data-Driven Exploration of the R User Community... Read more
Machine Learning 101: Predicting Drug Use Using Logistic Regression In R
Executive Summary Generalized Linear Models (GLM) Three types of link function: Logit, Probit, and Complementary log-log (cloglog) Building a logistic regression to predict drug use and compare these three types of GLM In Machine Learning 101 courses, stats professors introduce GLM right after linear regression as the next stepping... Read more
Image Compression In 10 Lines of R Code
Principal Component Analysis (PCA) is a powerful Machine Learning tool. As an unsupervised learning technique, it excels in dimension reduction and feature extraction However, do you know we can use PCA to compress images? In this post, I’ll walk through the process and explain how PCA can compress images... Read more
A Quick Look Into Bootstrapping
Executive Summary As a resampling method, bootstrapping allows us to generate statistical inferences about the population from a single sample. Learn to bootstrap in R. Bootstrapping lies the foundation for several machine learning methods (e.g., Bagging. I’ll explain Bagging in a follow-up post). [Related Article: Discovering 135 Nights of... Read more
Balancing Interpretability and Predictive Power with Cubist Models in R
Machine learning models are powerful tools that do well in their purpose of prediction. In many business applications, the power of these models is quite beneficial. With any application of a machine learning model, the process to choosing which model involves determining the model that performs best across a... Read more