Revolutionizing Visual Commerce with Computer Vision Models Revolutionizing Visual Commerce with Computer Vision Models
Companies like eBay and Amazon store millions of images of products; each image contains a wealth of information that can be... Revolutionizing Visual Commerce with Computer Vision Models

Companies like eBay and Amazon store millions of images of products; each image contains a wealth of information that can be leveraged to help consumers find the right product or advertise similar products. With the accessibility and effectiveness of computer vision models, namely convolutional neural networks, the high volume of information contained in images is now very accessible. At ODSC West in 2018, Robinson Piramuthu of Ebay presented key techniques to navigating challenges in revolutionizing visual commerce with computer vision models. 

[Related Article: Combining Millions of Products Into One Marketplace Using Computer Vision and NLP]

Robinson Piramuthu discussed three avenues for using deep learning to innovate visual commerce, including aspect prediction, leaf category prediction, and signature identification for visual ranking. Each approach employs a common neural network, but the final layer is engineered for each of the specific tasks.

Revolutionizing Visual Commerce

General Approach

The modeling approach for making predictions given images of products requires training on labeled images. Robinson recommended training the deep learning model on images with a simple background in the beginning stages to allow the model to learn from easy examples. Additionally, he emphasized the usefulness of images with a variety of angles that offer a richer representation of the features in the photo. Additionally, it is key to sample images from a variety of brands, sellers, conditions, and types to enable the model to generalize well to images in the wild. 

Visual Search

In visual commerce, it is advantageous to be able to recommend products to a consumer that are similar to those previously purchased. For this purpose, it is necessary to group similar products together with a measure of similarity. Instead of pure unsupervised learning techniques like PCA or K-Means Clustering, Robinson Piramuthu recommends using a semi-supervised approach. This entails training a neural network on a set group of classes (Ebay uses 16,000 classes), then feeding unlabeled images into the network. Finally, one must choose the classes that the target image is most similar to. The criteria for choosing similar classes is typically based on softmax activation values, indicating the model-predicted probabilities for each class. Robinson’s team determined that setting a threshold for the cumulative probabilities achieved optimal performance relative to thresholds based on simply the softmax probability. In the below example, with a cumulative threshold of 0.9, the categories C1-C3 would be identified as suitably similar to the target image. 

Revolutionizing Visual Commerce

Aspect Prediction

Often key attributes are missing in the description of an item and one needs a way to rapidly fill in missing characteristics. By re-engineering the final fully connected layer for a convolutional neural network, one can separate products by pattern, brand, or fashion, shown below.

 Revolutionizing Visual Commerce

Attributes are to be identified ahead of time that corresponds with characteristics key to customer selection and can be identified through images. For example, brand can be identified with deep learning, but something like size cannot. Greater specificity with regards to product descriptions can then allow one’s web interface to query products with greater precision. 

Iterative Fine Tuning

Robinson mentioned an intriguing technique for fine-tuning the training process for neural networks that involves changing the learning rate through an iterative training process. First, one trains a network with the initial learning rate of 0.01 until the model converges (i.e. validation accuracy plateaus). The grey dashed line in the bottom of the figure below represents the first training iteration. The same model is then retrained with a higher learning rate which initially causes a decrease in accuracy, but will converge at a higher accuracy rate than the previous model. This process is repeated until increasing the learning rate no longer results in improvement. 

[Related Article: 4 Steps to Start Machine Learning with Computer Vision]

Key Takeaways:

  • A common neural network can be used to address multiple challenges in visual commerce by engineering the last layer for the specific task. 
  • Semi-supervised learning offers an effective means of visually searching for similar items to advertise. 
  • Sampling design is critical for building effective models; sampling should include an equal diversity of characteristics for a given class. 
  • Innovations in model training such as iteratively fine-tuning the learning rate can offer substantial gains in model performance. 


Nathaniel Jermain

Nathaniel is a senior data scientist in the marketing industry, located in Saint Petersburg, FL. The focus of his work includes machine learning, statistical analysis, and a particular interest in causal inference. Feel free to connect with Nathaniel on LinkedIn: https://www.linkedin.com/in/njermain/