Retrieving Webpages Through Python Programming
This article discusses retrieving web pages through Python programming. The internet and the World Wide Web (WWW), is probably the most prominent source of information today. Most of that information is retrievable through HTTP. HTTP was invented originally to share pages of hypertext (hence th.e name Hypertext Transfer Protocol),... Read more
A Data Pattern with an R data.table Solution.
Summary: This blog examines a loading pattern seen often with government-generated, web-accessible data. The data comprise millions of records across multiple text or csv files, generally demarcated by time. The files may present different, but overlapping, attributes, while much of the data has a character representation,... Read more
Data Manipulation in R
Not all datasets are as clean and tidy as you would expect. Therefore, after importing your dataset into RStudio, most of the time you will need to prepare it before performing any statistical analyses. Data manipulation can even sometimes take longer than the actual analyses when the... Read more
Guide to R and Python in a Single Jupyter Notebook
Why pick one when you can use both at the same time? R is primarily used for statistical analysis, while Python provides a more general approach to data science. R and Python are object-oriented towards data science for programming language. Learning both is an ideal solution.... Read more
PI and Simulation Art in R
I spent the better part of an afternoon last week perusing a set of old flash drives I’d made years ago for my monthly notebook backups. One that especially caught my attention had a folder of R scripts, probably at least 15 years old — harking... Read more
Beginner’s Guide to K-Nearest Neighbors in R: from Zero to Hero
In the world of Machine Learning, I find the K-Nearest Neighbors (KNN) classifier makes the most intuitive sense and easily accessible to beginners even without introducing any math notations. To decide the label of an observation, we look at its neighbors and assign the neighbors’ label... Read more
An Efficient Way to Install and Load R Packages
Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. These extensions which are are collections of functions and datasets developed and published by R users are called packages. They extend... Read more
15+ Resources to Get Started With R – For Expert and Beginner
R is the second most sought after language in data science behind Python, so gaining mastery of R is a prerequisite to a thriving career in the field. Whether you’re an experienced developer or a newbie considering a career move, here are some excellent resources so... Read more
Quantifying R Package Dependency Risk
We recently commented on excess package dependencies as representing risk in the R package ecosystem. The question remains: how much risk? Is low dependency a mere talisman, or is there evidence it is a good practice (or at least correlates with other good practices)? [Related Article: Data-Driven Exploration of the... Read more
Machine Learning 101: Predicting Drug Use Using Logistic Regression In R
Executive Summary Generalized Linear Models (GLM) Three types of link function: Logit, Probit, and Complementary log-log (cloglog) Building a logistic regression to predict drug use and compare these three types of GLM In Machine Learning 101 courses, stats professors introduce GLM right after linear regression as... Read more