The field of machine learning has continued to accelerate through 2019, moving at light speed with compelling new results coming out of academia and the research arms of large tech firms like Google, Microsoft, Yahoo, Facebook and many more. It’s a daunting task for the down-in-the-trenches data scientist to keep pace. I advise my data science students at UCLA to be up on the latest research results in order to keep ahead of the pack. I recount how industry luminary Andrew Ng keeps his head above water by toting around a file of research papers (so when he has a free moment, like riding on an Uber, he can consume part of a paper). It does take time to add the research realm to your everyday duties, but I think it’s fun to know what technologies are fertile areas of research.
In this article, I’ll help save you some time by curating the current large pool of research efforts on arXiv.org down to the manageable short-list of my favorites that follows: here’s the best machine learning of 2019. Enjoy!
Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. This paper describes how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, the authors identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Recommendations encompass exciting research questions as well as promising business opportunities. The researchers call on the machine learning community to join the global effort against climate change.
Breakthroughs in machine learning are rapidly changing science and society, yet our fundamental understanding of this technology has lagged far behind. Indeed, one of the central tenets of the field, the bias-variance trade-off, appears to be at odds with the observed behavior of methods used in the modern machine learning practice. The bias-variance trade-off implies that a model should balance under-fitting and over-fitting: rich enough to express underlying structure in data, simple enough to avoid fitting spurious patterns. However, in the modern practice, very rich models such as neural networks are trained to exactly fit (i.e., interpolate) the data. Classically, such models would be considered over-fit, and yet they often obtain high accuracy on test data. This apparent contradiction has raised questions about the mathematical foundations of machine learning and their relevance to practitioners. This paper reconciles the classical understanding and the modern practice within a unified performance curve. This “double descent” curve subsumes the textbook U-shaped bias-variance trade-off curve by showing how increasing model capacity beyond the point of interpolation results in improved performance. The paper provides evidence for the existence and ubiquity of double descent for a wide spectrum of models and datasets, and it posits a mechanism for its emergence. This connection between the performance and the structure of machine learning models delineates the limits of classical analyses, and has implications for both the theory and practice of machine learning.
Semi-supervised learning has proven to be a powerful paradigm for leveraging unlabeled data to mitigate the reliance on large labeled data sets. This paper unifies the current dominant approaches for semi-supervised learning to produce a new algorithm, MixMatch, that works by guessing low-entropy labels for data-augmented unlabeled examples and mixing labeled and unlabeled data using MixUp. It is shown that MixMatch obtains state-of-the-art results by a large margin across many datasets and labeled data amounts. For example, on CIFAR-10 with 250 labels, we reduce error rate by a factor of 4 (from 38% to 11%) and by a factor of 2 on STL-10. The paper also demonstrates how MixMatch can help achieve a dramatically better accuracy-privacy trade-off for differential privacy. Finally, the authors perform an ablation study to tease apart which components of MixMatch are most important for its success.
Explainable machine learning (ML) enables human learning from ML, human appeal of automated model decisions, regulatory compliance, and security audits of ML models. Explainable ML (i.e. explainable artificial intelligence or XAI) has been implemented in numerous open source and commercial packages and explainable ML is also an important, mandatory, or embedded aspect of commercial predictive modeling in industries like financial services. However, like many technologies, explainable ML can be misused, particularly as a faulty safeguard for harmful black-boxes, e.g. fairwashing or scaffolding, and for other malevolent purposes like stealing models and sensitive training data. To promote best-practice discussions for this already in-flight technology, this paper presents internal definitions and a few examples before covering the proposed guidelines. This paper concludes with a seemingly natural argument for the use of interpretable models and explanatory, debugging, and disparate impact testing methods in life- or mission-critical ML systems.
Graphical causal inference as pioneered by Judea Pearl arose from research on artificial intelligence (AI), and for a long time had little connection to the field of machine learning. This paper discusses where links have been and should be established, introducing key concepts along the way. It argues that the hard open problems of machine learning and AI are intrinsically related to causality, and explains how the field is beginning to understand them.
From an environmental standpoint, there are a few crucial aspects of training a neural network that have a major impact on the quantity of carbon that it emits. These factors include: the location of the server used for training and the energy grid that it uses, the length of the training procedure, and even the make and model of hardware on which the training takes place. In order to approximate these emissions, this paper presents a Machine Learning Emissions Calculator, a tool for the community to better understand the environmental impact of training ML models. Accompanying this tool is an explanation of the factors cited above, as well as concrete actions that individual practitioners and organizations can take to mitigate their carbon emissions.
One of the most popular approaches to understanding feature effects of modern black box machine learning models are partial dependence plots (PDP). These plots are easy to understand but only able to visualize low order dependencies. The paper is about the question ‘How much can we see?’: A framework is developed to quantify the explainability of arbitrary machine learning models, i.e. up to what degree the visualization as given by a PDP is able to explain the predictions of the model. The result allows for a judgement whether an attempt to explain a black box model is sufficient or not.
Machine learning develops rapidly, which has made many theoretical breakthroughs and is widely applied in various fields. Optimization, as an important part of machine learning, has attracted much attention of researchers. With the exponential growth of data amount and the increase of model complexity, optimization methods in machine learning face more and more challenges. A lot of work on solving optimization problems or improving optimization methods in machine learning has been proposed successively. The systematic retrospect and summary of the optimization methods from the perspective of machine learning are of great significance, which can offer guidance for both developments of optimization and machine learning research. This paper begins by describing the optimization problems in machine learning. Then, it introduces the principles and progresses of commonly used optimization methods. Next, it summarizes the applications and developments of optimization methods in some popular machine learning fields. Finally, it explores and gives some challenges and open problems for the optimization in machine learning.
The AutoML task consists of selecting the proper algorithm in a machine learning portfolio, and its hyperparameter values, in order to deliver the best performance on the dataset at hand. Mosaic, a Monte-Carlo tree search (MCTS) based approach, is presented to handle the AutoML hybrid structural and parametric expensive black-box optimization problem. Extensive empirical studies are conducted to independently assess and compare: i) the optimization processes based on Bayesian optimization or MCTS; ii) its warm-start initialization; iii) the ensembling of the solutions gathered along the search. Mosaic is assessed on the OpenML 100 benchmark and the Scikit-learn portfolio, with statistically significant gains over Auto-Sklearn, winner of former international AutoML challenges.
A key challenge in developing and deploying Machine Learning (ML) systems is understanding their performance across a wide range of inputs. To address this challenge, this paper describes the What-If Tool, an open-source application that allows practitioners to probe, visualize, and analyze ML systems, with minimal coding. The What-If Tool lets practitioners test performance in hypothetical situations, analyze the importance of different data features, and visualize model behavior across multiple models and subsets of input data. It also lets practitioners measure systems according to multiple ML fairness metrics. The paper describes the design of the tool, and reports on real-life usage at different organizations.