Machine learning has been highly successful in data-intensive applications but is often hampered when the data set is small. Few-Shot Learning (FSL) was proposed to tackle this problem. It is used across different fields of Computer vision, NLP, etc. It has gained popularity because it helps in making predictions using a limited number of examples with supervised information, that is with few training samples. We can use meta-learning techniques to achieve few-shot learning. The goal of few-shot learning is not to let the model recognize the images in the training set and then generalize to the test set. Instead, the goal is to learn, that is “Learn to learn”. There are different types of networks within FSL – siamese networks, prototypical networks, etc.
Let’s understand the basic concept of few-shot learning using an example. We are playing a game where we are given 3 cards having pictures of 3 fruits. This is called the support set (Figure 1).
Figure 1: The three-card images comprising the support set
And now once we have seen these 3 cards, we are now given another card and asked to identify the fruit in that card. This is called our query set (Figure 2). We can easily detect that this is a pineapple, but for machines, it may not be that simple since, for instance, the model has only seen sideways fruit like in Figure1. The few-shot learning category of algorithms deals with such cases where you have few labeled examples. Now if for example, we train a network on a large training set having images of different fruits but they don’t contain pictures of fruits in the support set. Here, the learning task is not to detect different types of fruits, but to learn a similarity function that captures the similarity and differences between different fruits. We can set up the training task in such a way that the model learns to predict whether or not two images are of the same class. This task can be a binary classification task of 1 and 0 as depicted in the example using Siamese Networks in Figure 3. There are a number of loss functions we can utilize to train the Siamese Networks, like binary cross-entropy, contrastive loss, and triplet loss. Once the network has learned a similarity function, the model will be able to tell which fruit card picture among the three in the support set, is closest to the card image in the query set (pineapple) using distance metrics like Cosine or Euclidean distance. The model understands which content of the card image in the support set is closer to the content in the query set through similarity scores.
Figure 2: The card image comprising the query set
Figure 3: A simple 2 hidden layer siamese network for binary classification with logistic prediction p. The structure of the network is replicated across the top and bottom sections to form twin networks, with shared weight matrices at each layer (source: Siamese Neural Networks for One-shot Image Recognition).
The example that we have just learned above is the paradigm of the N-way K-shot classification shot. Here N classes comprise the support set. For each class, there are K-labeled images. There is a query set consisting of Q query images. In our example, we had N = 3 classes, K = 1 (that is 1 labeled image for each class), and Q = 1 (1 query image). The main task is to classify the images in the query set, using the N classes, given the N*K images in the support set. Here when K = 1, we call it one-shot classification. The support and query set is the jargon in the meta-learning framework. There is a difference between the training set and the support set in this example. The training set is huge and helps in learning a similarity function using some deep learning network (in this example, Siamese Networks), whereas the support set is small, and can provide some additional information at the inference time or help in fine-tuning the classifier. The prediction performance of the model depends on N ways and K labeled examples. As the number of ways increases, it becomes harder for the model to predict, and as K increases, prediction becomes easier.
The above explanation develops a basic understanding of few-shot learning, specifically related to metric-based methods. To learn more about few-shot learning, please attend the workshop session in ODSC East 2022 on April 21st. (https://odsc.com/speakers/few-shot-learning/).
Thanks to my colleague William Huang at Capital One for reviewing the blog.
About the Author/ODSC East 2022 Speaker: Isha Chaturvedi
I am a principal data scientist at Capital One, working in the conversational AI space. Prior to that, I worked at Ericsson as a data scientist in the computer vision team. I completed my master’s from New York University in an Urban Data Science program in 2018. I have worked in different NYU research labs (NYU Urban Observatory, NYU Sounds of New York City (SONYC) lab). Before moving to New York, I lived in Hong Kong for 5 years, where I did my bachelor’s from Hong Kong University of Science & Tech (HKUST) in Environmental Technology and Computer Science and later worked in HKUST- Deutsche Telecom Systems and Media lab (an Augmented Reality and Computer Vision focused lab) as a Research Assistant.