Unsupervised Learning: Evaluating Clusters
K-means clustering is a partitioning approach for unsupervised statistical learning. It is somewhat unlike agglomerative approaches like hierarchical clustering. A partitioning approach starts with all data points and tries to divide them into a fixed number of clusters. K-means is applied to a set of quantitative variables. We fix... Read more
K-Means Clustering Applied to GIS Data
GIS can be intimidating to data scientists who haven’t tried it before, especially when it comes to analytics. On its face, mapmaking seems like a huge undertaking. Plus esoteric lingo and strange datafile encodings can create a significant barrier to entry for newbies. There’s a reason why there are experts who... Read more
Assessment Metrics for Clustering Algorithms
Assessing the quality of your model is one of the most important considerations when deploying any machine learning algorithm. For supervised learning problems, this is easy. There are already labels for every example, so the practitioner can test the model’s performance on a reserved evaluation set. We don’t have... Read more
In a previous post, we demonstrated how to use the Python3 library Newspaper to painlessly extract data from news articles. Using Newspaper, I was able to extract text from over a 1000 articles about topics including, but limited to Data Science, Artificial Intelligence, and Big Data. In this follow... Read more