Scikit Optimize: Bayesian Hyperparameter Optimization in Python
So you want to optimize hyperparameters of your machine learning model and you are thinking whether Scikit Optimize is the right tool for you? You are in the right place. In this article I will: show you an example of using skopt on a real problem, evaluate this library based on... Read more
DevOps to DevSecOps: All about the Journey!
What do DevOps and DevSecOps—two unions of various divisions the same organization, who has any intention of rushing to the aid of agility and faster innovation—have in common with each other and what are the relationships between these development tools? Are they indicating any trend which is going to... Read more
Introduction to Spark NLP: Foundations and Basic Components
* This is the first article in a series of blog posts to help Data Scientists and NLP practitioners learn the basics of Spark NLP library from scratch and easily integrate it into their workflows. During this series, we will do our best to produce high-quality content and clear instructions with... Read more
15+ Resources to Get Started with R
R is the second most sought after language in data science behind Python, so gaining mastery of R is a prerequisite to a thriving career in the field. Whether you’re an experienced developer or a newbie considering a career move, here are some excellent resources so you can get... Read more
Quantifying R Package Dependency Risk
We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practice (or at least correlates with other good practices)? [Related Article: Data-Driven Exploration of the R User Community... Read more
Machine Learning 101: Predicting Drug Use Using Logistic Regression In R
Executive Summary Generalized Linear Models (GLM) Three types of link function: Logit, Probit, and Complementary log-log (cloglog) Building a logistic regression to predict drug use and compare these three types of GLM In Machine Learning 101 courses, stats professors introduce GLM right after linear regression as the next stepping... Read more
XGBoost: Enhancement Over Gradient Boosting Machines
In the first part of this discussion on XGBoost, I set the foundation for understanding the basic components of boosting. In brief, boosting uses sequences of decision trees that seek to reduce the residuals of the prior tree. In other words, each new tree uses the residual of the... Read more
Image Compression In 10 Lines of R Code
Principal Component Analysis (PCA) is a powerful Machine Learning tool. As an unsupervised learning technique, it excels in dimension reduction and feature extraction However, do you know we can use PCA to compress images? In this post, I’ll walk through the process and explain how PCA can compress images... Read more
A Quick Look Into Bootstrapping
Executive Summary As a resampling method, bootstrapping allows us to generate statistical inferences about the population from a single sample. Learn to bootstrap in R. Bootstrapping lies the foundation for several machine learning methods (e.g., Bagging. I’ll explain Bagging in a follow-up post). [Related Article: Discovering 135 Nights of... Read more
Implementing a Kernel Principal Component Analysis in Python
In this article, we discuss implementing a kernel Principal Component Analysis in Python, with a few examples.  Many machine learning algorithms make assumptions about the linear separability of the input data. The perceptron even requires perfectly linearly separable training data to converge. Other algorithms that we have covered so... Read more