What is Federated Learning?
The field of machine learning is constantly evolving, sometimes slowly, and at other times we experience the tech equivalent of the Cambrian Explosion with rapid advance that makes a good many data scientists experience a serious case of imposter syndrome. Take the case of a new iteration of machine... Read more
Novelty in Machine Learning, or “What Gets Me Excited Every Day About Data Science”
Note: Kirk will present two training sessions at ODSC East 2020. One will focus on “Solving the Data Scientist’s Dilemma: the Cold-Start Problem with 10+ Machine Learning Examples” and the other will look at “Adapting Machine Learning Algorithms to Novel Use Cases.” I have always appreciated the unusual, unexpected,... Read more
Tutorial: Accelerate and Productionize ML Model Inferencing Using Open-Source Tools
Faith and Prabhat are speakers for ODSC East 2020 this April. Be sure to check out their talk, “From Research to Production: Performant Cross-platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime,” there! You’ve finally got that perfect trained model for your data set. Now what? To... Read more
Guided Labeling: Human-in-the-Loop Label Generation with Active Learning and Weak Supervision
Paolo is a speaker for ODSC East 2020 this April 13-17. Be sure to check out his talk, “Guided Labeling: Human-in-the-Loop Label Generation with Active Learning and Weak Supervision,” there! One of the key challenges of utilizing supervised machine learning for real-world use cases is that most algorithms and... Read more
A Survey of Popular Ensembling Techniques – Part 1
Statisticians have long known that two heads are better than one, but three is even better. Sir Francis Galton, the polymathic giant behind the pairwise correlation, regression, et al., writes about the surprising power of Vox Populi—“voice of the people” if you’re Roman—to make predictions about unknown quantities superior... Read more
How To Build A Spam Classifier Using Decision Tree
In the realm of Supervised Learning, there are tons of classifiers, including Logistic Regressions (logit 101 and logit 102), LDA, Naive Bayes, SVM, KNN, Random Forest, Neural Networks, and so many more coming each day! The real question that all data scientists... Read more
Introduction to Apache Airflow
Apache Airflow is a tool created by the community to programmatically author, schedule, and monitor workflows. The biggest advantage of Airflow is the fact that it does not limit the scope of pipelines. Airflow can be used for building Machine Learning models, transferring data, or managing the infrastructure. Let’s... Read more
Training and Operationalizing Interpretable Machine Learning Models
AI offers companies the unique opportunity to transform their operations: from AI applications able to predict and schedule equipment’s maintenance, to intelligent R&D applications able to estimate the success of future drugs. However, in order to be able to leverage this opportunity, companies have to learn how to successfully... Read more
Are All Explainable Models Trustworthy?
Explainable AI or Explainable Data Science is one of the top buzzwords of Data Science at the moment. Models that are explainable are seen as the answer to many of recently recognized problems with machine learning, such as bias or data leaks. ... Read more
2020 Outlook on AutoML Updates & Latest Recent Advances
The field of automated machine learning or AutoML continues to expand with new products and services being announced at a frenetic pace. As a data scientist, I’m motivated to carefully monitor this technology because it could potentially impact my profession especially if these tools open up the field of... Read more