We’re just about finished with Q1 of 2019, and the research side of deep learning technology is forging ahead at a very good clip. I routinely monitor the efforts of AI researchers in order to get a heads-up for where the technology is headed. This foresight allows me to better optimize my time for making sure I know what I don’t know. As a result, I try to consume at least one research paper a week in a field of potentially hundreds or perhaps thousands of papers.
In this article, I’ll help save you some time by curating the current pool of research efforts published thus far in 2019 down to the manageable short-list that follows. I filtered my choices to include papers that also have an associated GitHub repo. Enjoy!
This research introduces PyTorch Geometric, a library for deep learning on irregularly structured input data such as graphs, point clouds and manifolds, built upon PyTorch. In addition to general graph data structures and processing methods, it contains a variety of recently published methods from the domains of relational learning and 3D data processing. PyTorch Geometric achieves high data throughput by leveraging sparse GPU acceleration, by providing dedicated CUDA kernels and by introducing efficient mini-batch handling for input examples of different size. The code is available on GitHub.
In the task of instance segmentation, the confidence of instance classification is used as mask quality score in most instance segmentation frameworks. This paper studies this problem and proposes Mask Scoring R-CNN which contains a network block to learn the quality of the predicted instance masks. The mask scoring strategy calibrates the misalignment between mask quality and mask score, and improves instance segmentation performance by prioritizing more accurate mask predictions during COCO AP evaluation. The code is available on GitHub.
Deep generative models are becoming a cornerstone of modern machine learning. Recent work on conditional generative adversarial networks (GANs) has shown that learning complex, high-dimensional distributions over natural images is within reach. While the latest models are able to generate high-fidelity, diverse natural images at high resolution, they rely on a vast quantity of labeled data. This paper demonstrates how one can benefit from recent work on self- and semi-supervised learning to outperform state-of-the-art (SOTA) on both unsupervised ImageNet synthesis, as well as in the conditional setting. The code is available on GitHub.
This paper presents a deep learning-based network, GCNv2, for generation of keypoints and descriptors. GCNv2 is built on a previous method, GCN, a network trained for 3D projective geometry. GCNv2 is designed with a binary descriptor vector as the ORB feature so that it can easily replace ORB in systems such as ORB-SLAM. The code is available on GitHub.
Supervised machine learning methods usually require a large set of labeled examples for model training. However, in many real applications, there are plentiful unlabeled data but limited labeled data; and the acquisition of labels is costly. Active learning (AL) reduces the labeling cost by iteratively selecting the most valuable data to query their labels from the annotator. This article introduces a Python toobox ALiPy for active learning. The code is available on GitHub.
DeepFashion2: A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images
Understanding fashion images has been advanced by benchmarks with rich annotations such as DeepFashion, whose labels include clothing categories, landmarks, and consumer-commercial image pairs. However, DeepFashion has non-negligible issues such as single clothing-item per image, sparse landmarks (4~8 only), and no per-pixel masks, yielding a significant gap from real-world scenarios. This paper fills in the gap by presenting DeepFashion2 to address these issues. It is a versatile benchmark of four tasks including clothes detection, pose estimation, segmentation, and retrieval. The code is available on GitHub.
In the last few years, deep multi-agent reinforcement learning (RL) has become a highly active area of research. A particularly challenging class of problems in this area is partially observable, cooperative, multi-agent learning, in which teams of agents must learn to coordinate their behavior while conditioning only on their private observations. This is an attractive research area since such problems are relevant to a large number of real-world systems and are also more amenable to evaluation than general-sum problems. Standardized environments such as the ALE and MuJoCo have allowed single-agent RL to move beyond toy domains, such as grid worlds. However, there is no comparable benchmark for cooperative multi-agent RL. As a result, most papers in this field use one-off toy problems, making it difficult to measure real progress. This paper proposes the StarCraft Multi-Agent Challenge (SMAC) as a benchmark problem to fill this gap. The code is available on GitHub.
Multi-layer neural networks have led to a remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting and misspecification. One approach to these estimation and related problems (local minima, co-linearity, feature discovery etc.) is called Dropout. The Dropout algorithm removes hidden units according to a Bernoulli random variable with probability p prior to each update, creating random “shocks” to the network that are averaged over updates. This paper shows that Dropout is a special case of a more general model published originally in 1990 called the Stochastic Delta Rule, or SDR. The code is available on GitHub.
Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models. Lingvo models are composed of modular building blocks that are flexible and easily extensible, and experiment configurations are centralized and highly customizable. Distributed training and quantized inference are supported directly within the framework, and it contains existing implementations of a large number of utilities, helper functions, and the newest research ideas. Lingvo has been used in collaboration by dozens of researchers in more than 20 papers over the last two years. This document outlines the underlying design of Lingvo and serves as an introduction to the various pieces of the framework, while also offering examples of advanced features that showcase the capabilities of the framework. The code is available on GitHub.
Adaptive optimization methods such as AdaGrad, RMSProp and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with SGD or even fail to converge due to unstable and extreme learning rates. This paper demonstrates that extreme learning rates can lead to poor performance. New variants of Adam and AMSGrad are provided, called AdaBound and AMSBound respectively, which employ dynamic bounds on learning rates to achieve a gradual and smooth transition from adaptive methods to SGD and give a theoretical proof of convergence. Further experiments were conducted on various popular tasks and models. Experimental results show that new variants can eliminate the generalization gap between adaptive methods and SGD and maintain higher learning speed early in training at the same time. The code is available on GitHub.
Editor’s note: Want to learn more about deep learning in-person? Attend ODSC East 2019 this April 30-May 3 in Boston and get advice directly from experts!