Using Object Detection for a Smarter Retail Checkout Experience Using Object Detection for a Smarter Retail Checkout Experience
I have been playing around with the Tensorflow Object Detection API and have been amazed by how powerful these models are. I want to share the... Using Object Detection for a Smarter Retail Checkout Experience

I have been playing around with the Tensorflow Object Detection API and have been amazed by how powerful these models are. I want to share the performance of the API for some practical use cases.

The first use case is a smarter retail checkout experience. This is a hot field right now after the announcement of Amazon Go stores.

Stores can be designed so they have smart shelves that track what a customer is picking from them. I did this by building two object detection models — one that tracks hand and captures what the hand has picked. And the second independent model that monitors shelf space. See the GIF below. By using the two models, you minimize the error from a single approach.

Hand Tracking and Inventory Monitoring

Another application of computer vision for retail checkout can be that instead of scanning items one by one at a checkout system, everything is placed together and cameras are able to detect and log everything. Maybe we don’t even need a checkout lane. Shopping carts can be equipped with cameras and you can simply walk out with your cart which can bill you as you step out of the store! Won’t this be cool! I used the API to design a “mini” model with 3 random items and the model could easily detect what was placed and in what quantity. See the GIF below. Through various experimentation, I found that the API performs very well even on items that are only partially visible.

Detection of items with high accuracy

So how do we build this?

  1. Collecting data

Images can be collected by looking through online publicly available data sets or by creating your own data. Each approach has its pros and cons. I generally use a mix of two. For example, the hand detector can be built by using publicly available datasets like the Ego Hand data set. This dataset has a lot of variability in hand shapes, colors, and poses which will be useful when the model is applied in the real world. On the other hand, for items on a shelf or in a cart, it is best to collect your own data since we don’t expect much variability as well as we want to ensure we collect data from all sides. Before you build your model, it is always a good idea to augment your data by using image processing libraries like PIL an OpenCV to create additional images which have random variations in brightness, zoom, rotation etc. This process can create a lot of additional samples and make the model robust.

For object detection models, we need to annotations — bounding boxes around objects of interest. I use labelimg for this purpose. It is written in Python and uses Qt for the interface. This is a very handle tool and annotations are created in the Pascal VOC format which makes it easy to create TFRecord files using the scripts shared in the Tensorflow Github — and

2. Building the model

I have written a very detailed tutorial on training Tensorflow Object Detection API on your custom data set — Building a Toy Detector with Tensorflow Object Detection API. And the associated Github. Please use this to get started.

One of the big decisions that you have to make when building the model is which object detection model to use as the fine tune checkpoint. The latest list of models available that have been trained on the COCO dataset are:

Tensorflow COCO Trained Models

There is a direct tradeoff b/w speed and accuracy. For a real-time hand detection, it is best to use either the SSD models or the Faster RCNN Inception which I personally prefer. For item detection on shelf or shopping cart, I would prefer a slower but higher accuracy model like the Faster RCNN Resnet or the Faster RCNN Inception Resnet.

3. Testing and improving the model

I personally think the real work starts after you build the first version of the model! Since no model is perfect, when you start using it, you will notice gaps in its performance. Then you will need to use your intuition to decide if these gaps can be plugged and the model refined or if the situation needs another model or non model hack to get to the accuracy you desire. If you are lucky all you need is to add additional data to improve the performance.

If you want to know more about Object Detection and the Tensorflow Object Detection API, please check out my article — Is Google Tensorflow Object Detection API the easiest way to implement image recognition?

Give me a ❤️ if you liked this post:) Hope you pull the code and try it yourself. If you have other ideas on this topic please comment on this post or mail me at

Other writings

PS: I have my own deep learning consultancy and love to work on interesting problems. I have helped several startups deploy innovative AI-based solutions. Check us out at —

If you have a project that we can collaborate on, then please contact me through my website or at



Original article here.

Priya Dwivedi

Priya Dwivedi

Priya Dwivedi has 10+ years experience as a data scientist. She now runs her own data analytics consultancy that builds deep learning models for Computer Vision and NLP problems. She has helped many startups deploy innovative AI based solutions. For more info please see the link — If you are interested in collaborating with her then please contact her at