fbpx
Deep Learning with TensorFlow 2.0
Editor’s Note: See Jon’s talk “Deep Learning with TensorFlow 2.0” at ODSC West 2019.  This summer, I had a blast speaking at Immersive A.I.—the first annual Open Data Science Conference (ODSC) event in New York. The venue was flawless, the organizers were exceptionally well-prepared, and there was a remarkable... Read more
The Benefits of Cloud Native ML And AI
As big data gets more complex, companies are struggling to accommodate the storage and computing needs of average organizations, much less massive enterprises. This is where cloud-native ML and AI comes into play. What Does Cloud Native Mean? Your computing power is limited. No matter what kind of hardware... Read more
Google Dataset Search Launched to Help Analysts Scour Repositories
Google Dataset Search is a new product in the beta phase that you can use to find datasets published online. The single interface allows you to search repositories worldwide. Imagine you start a new analytics project. For example, let’s say you want to explore numbers pertaining to Boston Public Schools. Before... Read more
K-Means Clustering Applied to GIS Data
Here, we use k-means clustering with GIS Data. GIS can be intimidating to data scientists who haven’t tried it before, especially when it comes to analytics. On its face, mapmaking seems like a huge undertaking. Plus esoteric lingo and strange datafile encodings can create a significant barrier to entry... Read more
Understanding the Hoeffding Inequality
If you read my last post on mathematically defining machine learning problems, then you’ll be familiar with the terminology here. Otherwise, I recommend you read that and then circle back here. We’ll work our way up to understanding the Hoeffding Bound over a few posts. However, it’s important to... Read more
A Short Summary of Smoothing Algorithms
When data are noisy, it’s our job as data scientists to listen for signals so we can relay it to someone who can decide how to act. To amp up how loudly hidden signals speak over the noise of big and/or volatile data, we can deploy smoothing algorithms, which... Read more
Machine Learning Approaches to Mobile Sensing Data to Make Self-Driving Cars Safer
Key Takeaways: Mobile sensing data from IoT devices have created opportunities for data scientists to better understand how we drive. Accelerometry and GPS data, for example, can be used to determine vehicular heading, acceleration, speed, climb, and other aspects of its motion. Machine learning and other data science techniques... Read more
Spark and The Art of Data Science
Apache Spark, or simply “Spark,” is a highly distributed, fault-tolerant, scalable framework that processes massive amounts of data. As it processes data, Spark abstracts the distribution of the data computations via a machine cluster thus enabling you to create applications using Java, Scala, Python, R, and SQL. Spark has... Read more
Survey Analysis in SQL and R
Charco Hui, as his Honours project in Statistics, has been writing a package for complex-survey analysis using dplyr and dbplyr. It’s here. At the moment it has only been tested with MonetDB, using the github version (0.5.2) of MonetDBlite, but it should work with many other databases (not SQLite, at the moment). I hope... Read more
Perl as Better grep
I like Perl’s pattern matching features more than Perl as a programming language. I’d like to take advantage of the former without having to go any deeper than necessary into the latter. The book Minimal Perl is useful in this regard. It has chapters on Perl as a better grep, a better awk,... Read more