fbpx
Data Manipulation in R
Not all datasets are as clean and tidy as you would expect. Therefore, after importing your dataset into RStudio, most of the time you will need to prepare it before performing any statistical analyses. Data manipulation can even sometimes take longer than the actual analyses when the... Read more
Creating if/elseif/else Variables in Python/Pandas
Frequencies and Chaining in Python-Pandas
This article discusses chaining in Python. A few years ago, in a Q&A session following a presentation I gave on data analysis (DA) to a group of college recruits for my then consulting company, I was asked to name what I considered the most important analytic... Read more
The goal of a data analysis pipeline in Python is to allow you to transform data from one state to another through a set of repeatable, and ideally scalable, steps. Problems for which I have used data analysis pipelines in Python include: Processing financial / stock... Read more
Guide to R and Python in a Single Jupyter Notebook
Why pick one when you can use both at the same time? R is primarily used for statistical analysis, while Python provides a more general approach to data science. R and Python are object-oriented towards data science for programming language. Learning both is an ideal solution.... Read more
Complex Queries in SQL
Hopefully, your SQL queries aren’t tangled like this. (Photo: Author) Texts on SQL are good at providing the basic templates of SQL syntax, but sometimes the queries in those books are a little idealized compared to real life, with no more than two or three tables... Read more
PI and Simulation Art in R
I spent the better part of an afternoon last week perusing a set of old flash drives I’d made years ago for my monthly notebook backups. One that especially caught my attention had a folder of R scripts, probably at least 15 years old — harking... Read more
Beginner’s Guide to K-Nearest Neighbors in R: from Zero to Hero
In the world of Machine Learning, I find the K-Nearest Neighbors (KNN) classifier makes the most intuitive sense and easily accessible to beginners even without introducing any math notations. To decide the label of an observation, we look at its neighbors and assign the neighbors’ label... Read more
An Efficient Way to Install and Load R Packages
Unlike other programs, only fundamental functionalities come by default with R. You will thus often need to install some “extensions” to perform the analyses you want. These extensions which are are collections of functions and datasets developed and published by R users are called packages. They extend... Read more
Major Updates to the Most Popular Data Science Frameworks in 2019
This time last year we brought you a detailed report of all the important updates for popular data science (machine learning and deep learning) frameworks throughout 2018. The developers of these frameworks continue to innovate at an accelerated rate. Data scientists demand more powerful tools in... Read more