ImageNet is one of the most widely used datasets in Computer Vision applications. However, studies have shown biases prevalent in this dataset based on the collection methodology and the types of images present. In this respect, a team of researchers at the Visual Geometry Group, the University of Oxford, have proposed a new dataset called PASS for self-supervised (SSL) model pretaining to address privacy and fairness issues specifically. This article is a summary of the paper published by the team.
PASS: An ImageNet replacement for self-supervised pretraining without humans
The PASS paper on arXiv | Source: https://arxiv.org/pdf/2109.13228.pdf
What is PASS?
PASS stands for
Pictures without humans for Self-Supervision. It is a large-scale unlabelled image dataset created to alleviate ethical, legal, and privacy concerns around the famous Imagenet dataset.
Issues with ImageNet
Some of the current issues with ImageNet are:
Issues with ImageNet | Image by Author
- Data protection: Contains personal information taken without consent
- Copyright: Unclear license usage
- Biases —The dataset is collected by scraping with search engines,
- Problematic image content- stereotyped and inappropriate depictions of specific categories.
PASS as an alternative
The authors point out that:
the current state-of-the-art model pretraining uses self-supervised learning (SSL) and thus does not require labels at all. Motivated by this, we thus consider forming a dataset without using labels, significantly increasing diversity and removing the search engine selection bias. Because we remove images with humans, we further significantly reduce the risk of including contextual biases linked to the appearance of people. Furthermore, due to its more random and unsupervised nature, this dataset also serves as a better benchmark for SSL to study scaling to natural images that are not curated to a pre-defined set of class labels, addressing a technical shortcoming of current evaluations.
- Data protection: The PASS dataset doesn’t contain humans or body parts
- Copyright: PASS only contains CC-BY licensed images with complete attribution information.
- Biases — The dataset doesn’t contain labels, thereby alleviating the search engine bias.
- Problematic image content- There is no personally identifiable information such as license plates, signatures, or handwriting and NSFW(not safe for work) images.
The authors started with a 100 million random Flickr images dataset called Yahoo Flickr Creative Commons 100 Million (YFCC100m). Next, only images with valid CC-BY licensed images were filtered, which totaled 17 Million. From here, problematic images containing humans were removed, leaving the total net images to 10M. Since the distribution of images per photographer is highly skewed, the image contribution per photographer was balanced, and finally, these images were submitted for human labeling.
Dataset generation pipeline | Source: https://arxiv.org/pdf/2109.13228.pdf
The authors also mention that the annotations were performed over the course of three weeks by an annotation company whose annotators were paid 150% of the minimum wage.
The results submitted by the authors in the paper are as follows:
(i) Self-supervised approaches such as MoCo, SwAV, and DINO train well on the PASS dataset, yielding strong image representations
(ii) The exclusion of human images during pretraining has almost no effect on downstream task performances, even if this is done in ImageNet;
(iii) The performance of models trained on PASS yield better results than pretraining on ImageNet(including ImageNet without humans or even Places 205 dataset), as shown below:
Frozen encoder evaluations | Source: https://arxiv.org/pdf/2109.13228.pdf
(iv) For finetuning evaluations, such as detection and segmentation, PASS pretraining yields results within ±1% mAP and AP50 on the COCO dataset.
Finetuning representation evaluations | Source: https://arxiv.org/pdf/2109.13228.pdf
(v) Even on tasks involving humans, such as dense pose prediction, pretraining on our dataset yields performance on par with ImageNet pertaining even though PASS has no human images
What does it mean for Imagenet?
The authors stress that PASS does not make existing datasets obsolete as it is insufficient for benchmarking. However, The idea behind PASS is to showcase that model pretraining is often possible while using safer data, and it also provides the basis for a more robust evaluation of pretraining methods.
Is PASS Bias-Free?
Even though PASS significantly reduces data protection and other ethics risks for the data subjects, some concerns prevail, as pointed out by the authors themselves:
- Even though great care has been put in filtering images, it is still possible for the harmful images to slip through.
- The issue of geographical bias persists since the images are sampled randomly.
- PASS cannot be used to learn models of people, such as for pose recognition, due to the absence of human images.
- PASS (in contrast to ImageNet) cannot be used alone for training and benchmarking since PASS contains no labels
PASS has its limitations but still is an encouraging step from the research community towards reducing ethical and legal risks for many tasks and applications. The ImageNet dataset has definitely ushered in an era of state-of-the-art computer vision applications, but as a community, we cannot overlook the shortcomings in the dataset.