Unsupervised Learning: Evaluating Clusters
K-means clustering is a partitioning approach for unsupervised statistical learning. It is somewhat unlike agglomerative approaches like hierarchical clustering. A partitioning approach starts with all data points and tries to divide them into a fixed number of clusters. K-means is applied to a set of quantitative... Read more
K-Means Clustering Applied to GIS Data
Here, we use k-means clustering with GIS Data. GIS can be intimidating to data scientists who haven’t tried it before, especially when it comes to analytics. On its face, mapmaking seems like a huge undertaking. Plus esoteric lingo and strange datafile encodings can create a significant... Read more
Assessment Metrics for Clustering Algorithms
Assessing the quality of your model is one of the most important considerations when deploying any machine learning algorithm. For supervised learning problems, this is easy. There are already labels for every example, so the practitioner can test the model’s performance on a reserved evaluation set.... Read more
In a previous post, we demonstrated how to use the Python3 library Newspaper to painlessly extract data from news articles. Using Newspaper, I was able to extract text from over a 1000 articles about topics including, but limited to Data Science, Artificial Intelligence, and Big Data.... Read more