fbpx
Top 10 Pre-Trained Models for Image Embedding Every Data Scientist Should Know Top 10 Pre-Trained Models for Image Embedding Every Data Scientist Should Know
The rapid developments in Computer Vision — image classification use cases have been further accelerated by the advent of transfer learning.... Top 10 Pre-Trained Models for Image Embedding Every Data Scientist Should Know

The rapid developments in Computer Vision — image classification use cases have been further accelerated by the advent of transfer learning. It takes a lot of computational resources and time to train a computer vision neural network model on a large dataset of images.

Luckily, this time and resources can be shortened by using pre-trained models. The technique of leveraging feature representation from a pre-trained model is called transfer learning. The pre-trained are generally trained using high-end computational resources and on massive datasets.

In-Person and Virtual Conference

September 5th to 6th, 2024 – London

Featuring 200 hours of content, 90 thought leaders and experts, and 40+ workshops and training sessions, Europe 2024 will keep you up-to-date with the latest topics and tools in everything from machine learning to generative AI and more.

The pre-trained models can be used in various ways:

  • Using the pre-trained weights and directly making predictions on the test data
  • Using the pre-trained weights for initialization and training the model using the custom dataset
  • Using only the architecture of the pre-trained network, and training it from scratch on the custom dataset

This article walks through the top 10 state-of-the-art pre-trained models to get image embedding. All these pre-trained models can be loaded as keras models using the keras.application API.

1) VGG:

The VGG-16/19 networks were introduced at the ILSVRC 2014 conference since it is one of the most popular pre-trained models. It was developed by the Visual Graphics Group at the University of Oxford.

There are two variations of the VGG model: 16 and 19 layers network, VGG-19 (19-layer network) being an improvement of the VGG-16 (16-layer network) model.

Architecture:

(Source, Free-to-use license under CC BY-SA 4.0), VGG-16 Network architecture

The VGG network is simple and sequential in nature and uses a lot of filters. At each stage, small (3*3) filters are used to reduce the number of parameters.

The VGG-16 network has the following:

  • Convolutional Layers = 13
  • Pooling Layers = 5
  • Fully Connected Dense Layers = 3

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for VGG-16/19:

  • Paper Link: https://arxiv.org/pdf/1409.1556.pdf
  • GitHub: VGG
  • Published On: April 2015
  • Performance on ImageNet Dataset: 71% (Top 1 Accuracy), 90% (Top 5 Accuracy)
  • Number of Parameters: ~140M
  • Number of Layers: 16/19
  • Size on Disk: ~530MB

Implementation:

tf.keras.applications.VGG16(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

The above-mentioned code is for VGG-16 implementation, keras offers a similar API for VGG-19 implementation, for more details refer to this documentation.

2) Xception:

Xception is a deep CNN architecture that involves depthwise separable convolutions. A depthwise separable convolution can be understood as an Inception model with a maximally large number of towers.

Architecture:

(Source, Free-to-use license under CC BY-SA 4.0), Xception architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for Xception:

  • Paper Link: https://arxiv.org/pdf/1409.1556.pdf
  • GitHub: Xception
  • Published On: April 2017
  • Performance on ImageNet Dataset: 79% (Top 1 Accuracy), 94.5% (Top 5 Accuracy)
  • Number of Parameters: ~30M
  • Depth: 81
  • Size on Disk: 88MB

Implementation:

  • Instantiate the Xception model using the below-mentioned code:
tf.keras.applications.Xception(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

The above-mentioned code is for Xception implementation, for more details refer to this documentation.

3) ResNet:

The previous CNN architectures were not designed to scale to many convolutional layers. It resulted in a vanishing gradient problem and limited performance upon adding new layers to the existing architecture.

ResNets architecture offers to skip connections to solve the vanishing gradient problem.

Architecture:

(Source, Free-to-use license under CC BY-SA 4.0), ResNet architecture

This ResNet model uses a 34-layer network architecture inspired by the VGG-19 model to which the shortcut connections are added. These shortcut connections then convert the architecture into a residual network.

There are several versions of ResNet architecture:

  • ResNet50
  • ResNet50V2
  • ResNet101
  • ResNet101V2
  • ResNet152
  • ResNet152V2

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for ResNet models:

  • Paper Link: https://arxiv.org/pdf/1512.03385.pdf
  • GitHub: ResNet
  • Published On: Dec 2015
  • Performance on ImageNet Dataset: 75–78% (Top 1 Accuracy), 92–93% (Top 5 Accuracy)
  • Number of Parameters: 25–60M
  • Depth: 107–307
  • Size on Disk: ~100–230MB

Implementation:

  • Instantiate the ResNet50 model using the below-mentioned code:
tf.keras.applications.ResNet50(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    **kwargs
)

The above-mentioned code is for ResNet50 implementation, keras offers a similar API to other ResNet architecture implementations, for more details refer to this documentation.

4) Inception:

Multiple deep layers of convolutions resulted in the overfitting of the data. To avoid overfitting, the inception model uses parallel layers or multiple filters of different sizes on the same level, to make the model wider rather than making it deeper. The Inception V1 model is made of 4 parallel layers with: (1*1), (3*3), (5*5) convolutions, and (3*3) max pooling.

Inception (V1/V2/V3) is deep learning model-based CNN network developed by a team at Google. InceptionV3 is an advanced and optimized version of the InceptionV1 and V2 models.

Architecture:

The InceptionV3 model is made up of 42 layers. The architecture of InceptionV3 is progressively step-by-step built as:

  • Factorized Convolutions
  • Smaller Convolutions
  • Asymmetric Convolutions
  • Auxilliary Convolutions
  • Grid Size Reduction

All these concepts are consolidated into the final architecture mentioned below:

(Source, Free-to-use license under CC BY-SA 4.0), InceptionV3 architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for InceptionV3 models:

Implementation:

  • Instantiate the InceptionV3 model using the below-mentioned code:
tf.keras.applications.InceptionV3(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

The above-mentioned code is for InceptionV3 implementation, for more details refer to this documentation.

5) InceptionResNet:

InceptionResNet-v2 is a CNN model developed by researchers at Google. The target of this model was to reduce the complexity of InceptionV3 and explore the possibility of using residual networks on the Inception model.

Architecture:

(Source, Free-to-use license under CC BY-SA 4.0), Inception-ResNet-V2 architecture

Input: Image of dimensions (299, 299, 3)

Output: Image embedding of 1000-dimension

Other Details for Inception-ResNet-V2 models:

Implementation:

  • Instantiate the Inception-ResNet-V2 model using the below-mentioned code:
tf.keras.applications.InceptionResNetV2(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
    **kwargs
)

The above-mentioned code is for Inception-ResNet-V2 implementation, for more details refer to this documentation.

6) MobileNet:

MobileNet is a streamlined architecture that uses depthwise separable convolutions to construct deep convolutional neural networks and provides an efficient model for mobile and embedded vision applications.

Architecture:

(Source, Free-to-use license under CC BY-SA 4.0), Mobile-Net architecture

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for MobileNet models:

Implementation:

  • Instantiate the MobileNet model using the below-mentioned code:
tf.keras.applications.MobileNet(
    input_shape=None,
    alpha=1.0,
    depth_multiplier=1,
    dropout=0.001,
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
    **kwargs
)

The above-mentioned code is for MobileNet implementation, keras offers a similar API to other MobileNet architecture (MobileNet-V2, MobileNet-V3) implementation, for more details refer to this documentation.

7) DenseNet:

DenseNet is a CNN model developed to improve accuracy caused by the vanishing gradient in high-level neural networks due to the long distance between input and output layers and the information vanishes before reaching the destination.

Architecture:

A DenseNet architecture has 3 dense blocks. The layers between two adjacent blocks are referred to as transition layers and change feature-map sizes via convolution and pooling.

(Source, Free-to-use license under CC BY-SA 4.0), DenseNet architecture

Input: Image of dimensions (224, 224, 3)

Output: Image embedding of 1000-dimension

Other Details for DenseNet models:

Implementation:

  • Instantiate the DenseNet121 model using the below-mentioned code:
tf.keras.applications.DenseNet121(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

The above-mentioned code is for DenseNet implementation, keras offers a similar API to other DenseNet architecture (DenseNet-169, DenseNet-201) implementation, for more details refer to this documentation.

8) NasNet:

Google researchers designed a NasNet model that framed the problem to find the best CNN architecture as a Reinforcement Learning approach. The idea is to search for the best combination of parameters of the given search space of a number of layers, filter sizes, strides, output channels, etc.

Input: Image of dimensions (331, 331, 3)

Other Details for NasNet models:

  • Paper Link: https://arxiv.org/pdf/1608.06993.pdf
  • Published On: Apr 2018
  • Performance on ImageNet Dataset: 75–83% (Top 1 Accuracy), 92–96% (Top 5 Accuracy)
  • Number of Parameters: 5–90M
  • Depth: 389–533
  • Size on Disk: 23–343MB

Implementation:

  • Instantiate the NesNetLarge model using the below-mentioned code:
tf.keras.applications.NASNetLarge(
    input_shape=None,
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

The above-mentioned code is for NesNet implementation, keras offers a similar API to other NasNet architecture (NasNetLarge, NasNetMobile) implementation, for more details refer to this documentation.

9) EfficientNet:

EfficientNet is a CNN architecture from the researchers of Google, that can achieve better performance by a scaling method called compound scaling. This scaling method uniformly scales all dimensions of depth/width/resolution by a fixed amount (compound coefficient) uniformly.

Architecture:

(Source, Free-to-use license under CC BY-SA 4.0), Efficient-B0 architecture

Other Details for EfficientNet Models:

  • Paper Link: https://arxiv.org/pdf/1905.11946v5.pdf
  • GitHub: EfficientNet
  • Published On: Sep 2020
  • Performance on ImageNet Dataset: 77–84% (Top 1 Accuracy), 93–97% (Top 5 Accuracy)
  • Number of Parameters: 5–67M
  • Depth: 132–438
  • Size on Disk: 29–256MB

Implementation:

  • Instantiate the EfficientNet-B0 model using the below-mentioned code:
tf.keras.applications.EfficientNetB0(
    include_top=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
    **kwargs
)

The above-mentioned code is for EfficientNet-B0 implementation, keras offers a similar API for other EfficientNet architecture (EfficientNet-B0 to B7, EfficientNet-V2-B0 to B3) implementation, for more details refer to this documentation, and this documentation.

10) ConvNeXt:

The ConvNeXt CNN model was proposed as a pure convolutional model (ConvNet), inspired by the design of Vision Transformers, that claims to outperform them.

Architecture:

(Source, Free-to-use license under CC BY-SA 4.0), ConvNeXt architecture

Other Details for ConvNeXt models:

Implementation:

  • Instantiate the ConvNeXt-Tiny model using the below-mentioned code:
tf.keras.applications.ConvNeXtTiny(
    model_name="convnext_tiny",
    include_top=True,
    include_preprocessing=True,
    weights="imagenet",
    input_tensor=None,
    input_shape=None,
    pooling=None,
    classes=1000,
    classifier_activation="softmax",
)

The above-mentioned code is for ConvNeXt-Tiny implementation, keras offers a similar API of the other EfficientNet architecture (ConvNeXt-Small, ConvNeXt-Base, ConvNeXt-Large, ConvNeXt-XLarge) implementation, for more details refer to this documentation.

Summary:

I have discussed 10 popular CNN architectures that can generate embeddings using transfer learning. These pre-trained CNN models have outperformed the ImageNet dataset and proved the best. Keras library offers APIs to load the architecture and weights of the discussed pre-trained models. The image embeddings generated from these models can be used for various use cases.

However, this is a continuously growing domain and there is always a new CNN architecture to look forward to.

Article originally posted here by Suraj Gurav.

So, I bet you’re ready to upskill your AI capabilities right? Well, if you want to get the most out of AI, you’ll want to attend ODSC East this April. At ODSC East, you’ll not only expand your AI knowledge and develop unique skills, but most importantly, you’ll build up the foundation you need to help future-proof your career through upskilling with AI. Register now for 50% off all ticket types! 

ODSC Community

The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.

1