Monthly Summary of Selected Trends, Activities and Insights for R – October 2018
In October, the spike in activities observed in September across the R ecosystem was maintained. In the following article, a summary of selected trends, activities, and insights around the R language in October, 2018, are presented as the R language keeps trending. Data for the trends and activities summarized... Read more
Handling Missing Data in Python/Pandas
Key Takeaways: It’s important to describe missing data and the challenges it poses. You need to clarify a confusing terminology that further adds to the field’s complexity. You should take the time to review methods for handling missing data. You need to learn how to apply robust multiple imputation... Read more
How Tidyverse Guides R Programmers Through Data Science Workflows
Whenever someone asks me how to get into data science using R, I invariably recommend checking out the tidyverse package. Tidyverse is a great launch pad for a language like R because it offers order and consistency. I studied programming language design as a CS undergrad. At the time,... Read more
Vectorizing Random Number Generators for Greater Speed: PCG and xorshift128+ (AVX-512 edition)
Most people designing random number generators program using regular code. If they are aiming for speed, they probably write functions in C. However, our processors have fast “vectorized” (or SIMD) instructions that can allow you to go faster. These instructions do several operations at once. For example, recent Skylake-X... Read more
Introducing ODPi Egeria – The Industry’s First Open Metadata Standard
Organizations looking to better locate, understand, manage and gain value from their data have a new industry standard to leverage. ODPi, a nonprofit Linux Foundation organization focused upon accelerating the open ecosystem of big data solutions, recently announced ODPi Egeria, a new project that supports the free flow of... Read more
Build a Multi-Class Support Vector Machine in R
Support Vector Machines (SVMs) are quite popular in the data science community. Data scientists often use SVMs for classification tasks, and they tend to perform well in a variety of problem domains. An SVM performs classification tasks by constructing hyperplanes in a multidimensional space that separates cases of different... Read more
Setting Your Hypothesis Test Up For Success
I want to go deep with you on exactly how I work with stakeholders ahead of launching a hypothesis test. This step is crucial to make sure that once a test is done running, we’ll actually be able to analyze it. This includes: A well-defined hypothesis A solid test... Read more
Client-side Web Development and Machine Learning
You might not expect client-side web development and machine learning to be in the same sentence. In this article, however, we’re going to look at how and why these two are beginning to collaborate rather successfully. There are many hidden uses for a collaboration between Javascript and machine learning.... Read more
Monthly Summary of Selected Trends, Activities, and Insights for R – September 2018
In September, there was a serious spike in activities across the R ecosystem. This article examines a summary of selected R trends, activities, and insights in September. Data for the trends and activities summarized here were obtained from popular websites used by the R community such as Google, GitHub, StackOverflow,... Read more
Quick start for Golang, Google Cloud API, and Speech Recognition
Speech recognition is becoming increasingly powerful and helpful to developers across the world. In this short article, I would like to demonstrate how easy it is to set up in your own web application using Google’s powerful speech API. I will assume you have basic programming experience. To get... Read more