Warning: Invalid argument supplied for foreach() in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 95
Warning: array_merge(): Expected parameter 2 to be an array, null given in /home/customer/www/opendatascience.com/public_html/wp-includes/nav-menu.php on line 102
In the world of Machine Learning, I find the K-Nearest Neighbors (KNN) classifier makes the most intuitive sense and easily accessible to beginners even without introducing any math notations. To decide the label of an observation, we look at its neighbors and assign the neighbors’ label... Read more
Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. These extensions which are are collections of functions and datasets developed and published by R users are called packages. They extend... Read more
Although I have been using RStudio for several years, I only recently discovered RStudio addins. Since then, I am using these addins almost every time I use RStudio. What are RStudio addins? RStudio addins are extensions... Read more
R is the second most sought after language in data science behind Python, so gaining mastery of R is a prerequisite to a thriving career in the field. Whether you’re an experienced developer or a newbie considering a career move, here are some excellent resources so... Read more
We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practice (or at least correlates with other good practices)? [Related Article: Data-Driven Exploration of the... Read more
Executive Summary Generalized Linear Models (GLM) Three types of link function: Logit, Probit, and Complementary log-log (cloglog) Building a logistic regression to predict drug use and compare these three types of GLM In Machine Learning 101 courses, stats professors introduce GLM right after linear regression as... Read more
Principal Component Analysis (PCA) is a powerful Machine Learning tool. As an unsupervised learning technique, it excels in dimension reduction and feature extraction However, do you know we can use PCA to compress images? In this post, I’ll walk through the process and explain how PCA... Read more
Executive Summary As a resampling method, bootstrapping allows us to generate statistical inferences about the population from a single sample. Learn to bootstrap in R. Bootstrapping lies the foundation for several machine learning methods (e.g., Bagging. I’ll explain Bagging in a follow-up post). [Related Article: Discovering... Read more
Machine learning models are powerful tools that do well in their purpose of prediction. In many business applications, the power of these models is quite beneficial. With any application of a machine learning model, the process to choosing which model involves determining the model that performs... Read more
Keras and Tensorflow are two very powerful packages that are normally accessed via python. Since the packages were developed for python they may have the illusion of being out of reach for R users. However, this is not the case as the Keras and Tensorflow packages... Read more