

Tutorial: Accelerate and Productionize ML Model Inferencing Using Open-Source Tools
Blogs from ODSC SpeakersConferencesMachine LearningModelingAzureEast 2020InferencingMicrosoftONNXposted by ODSC Community March 6, 2020 ODSC Community

You’ve finally got that perfect trained model for your data set. Now what? To run and deploy it to production, there’s a host of issues that lie ahead. Performance latency, environments, framework compatibility, security, deployment targets…there are lots to consider! This tutorial will show you how to accelerate and productionize ML model inferencing. We’ll look at solutions for common challenges using ONNX and related tooling.
ONNX (Open Neural Network eXchange), an open-source graduate project under the Linux Foundation LF AI, defines a standard format for machine learning models that enables AI developers to use their frameworks and tools of choice to train, infer and deploy on a variety of hardware targets. Models trained with PyTorch, Tensorflow, Scikit-Learn, and others can be converted to the ONNX format and benefit from acceleration on cloud and edge devices via supported hardware optimizations.
One mainstream way to infer ONNX models is using the open-source high-performance ONNX Runtime inference engine. For complex DNNs, ONNX Runtime can provide significant gains in performance, as demonstrated by this 17x inference acceleration of a BERT model used by Microsoft Bing. For traditional ML, ONNX Runtime can provide a more secure and straight-forward deployment story to minimize security vulnerabilities exposed by .pkl files or messy versioning (ONNX Runtime is fully backward compatible with older versions of ONNX models). With APIs for C++, C#, C, Python, and Java, ONNX Runtime removes the need to have a Python environment for inferencing. For systems looking to experiment with or support models from different frameworks, using ONNX models with ONNX Runtime provides a single integration point without needing to maintain custom code or multiple runtimes.
Related reading: Interoperable AI: High-Performance Inferencing of ML and DNN Models Using Open-Source Tools
Tutorial Overview
Let’s put this in action. In the examples below, we’ll demonstrate how to get started with ONNX by training an image classification model in PyTorch and a classification model in scikit-learn, converting them to the ONNX format and inferring the converted model using ONNX Runtime.
- Image classification using PyTorch:
- Train a PyTorch model (resnet18) for a sports image classification problem.
- Export the trained model as ONNX for high-performance inferencing.
- Classification task using scikit-learn:
- Train a scikit-learn model on a dataset with 5 classes.
- Convert the trained model to ONNX format using skl2onnx.
Image Classification with PyTorch
Training a PyTorch model
The task at hand is to classify sports images into 8 classes. We partition the data into train, test and validation sets after performing a series of transforms. These involve cropping the images into 224 x 224 sized samples, randomly flipping them horizontally, converting them to tensor and normalizing them.
data_transforms = transforms.Compose([ transforms.RandomResizedCrop(224), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize([0.47637546, 0.485785 , 0.4522678 ], [0.24692202, 0.24377407, 0.2667196 ]) ]) data = datasets.ImageFolder(root=image_folder, transform=data_transforms)
We pick resnet18 from the list of pre-trained models provided by PyTorch and replace its final layer so that it aligns with our problem.
model = models.__dict__[model_name](pretrained=True) # Alter the final layer final_layer_input = model.fc.in_features # nn.Linear a linear transformation to the incoming data: y = x A^T + b model.fc = nn.Linear(final_layer_input, num_classes)
We train the model on the training set, also scoring on the validation set at each epoch. After training is complete, we run a prediction on the test set.
Conversion to ONNX
Since PyTorch natively supports exporting models in the ONNX format, the trained model can be exported as shown below:
dummy_input = torch.randn(1, 3, 224, 224) torch.onnx.export(model, dummy_input, "sports_classification.onnx")
Inferencing with ONNX Runtime
Once the model is in the interoperable ONNX format, we can inference the ONNX model using onnxruntime:
from onnxruntime import InferenceSession
ort_session = InferenceSession('sports_classification.onnx') result = ort_session.run(None, {'input.1': test_sample.numpy()})
As shown in the table below, ONNX Runtime outperforms PyTorch in terms of prediction latency for various batch sizes. There are additional performance tuning techniques that can further improve latency based on hardware choices.
Batch size | Prediction time (ms)* | ||
ONNX Runtime | PyTorch | % improvement | |
1 | 60.2 ms ± 6.53 ms | 108 ms ± 1.75 ms | 79% |
2 | 129 ms ± 1.97 ms | 175 ms ± 1.55 ms | 36% |
4 | 259 ms ± 3.23 ms | 318 ms ± 5.18 ms | 23% |
8 | 510 ms ± 10.8 ms | 537 ms ± 5.77 ms | 5% |
16 | 989 ms ± 7.84 ms | 1.07 s ± 7.08 ms | 8% |
32 | 1.98 s ± 26.3 ms | 2.03 s ± 19.7 ms | 3% |
The full detailed Jupyter notebook can be found here.
Classification task with scikit-learn
Next up, let’s look at a classical ML model trained using scikit-learn.
Training a Scikit-Learn model
We use make_classification() to generate a dataset with 10,000 samples, 10 features, and 5 classes. We standardize the data using StandardScaler() and then train an MLPClassifier model.
model = Pipeline([('scaler', StandardScaler()), ('predictor', MLPClassifier(random_state=42))]) model.fit(X_train, y_train)
Conversion to ONNX
To convert the model to the ONNX format, we will use convert_sklearn() from the skl2onnx library. convert_sklearn() expects the scikit-learn model, model name and input type as its parameters as shown below:
model_onnx = convert_sklearn(model, 'classification model', [('input', FloatTensorType([None, X_test.shape[1]]))])
Once the model is converted, we use save_model() from onnxmltools.utils to save the converted model.
save_model(model_onnx, 'onnx_model.onnx')
Inferencing with ONNX Runtime
Similar to the previous example, for inferencing we use InferenceSession() from the onnxruntime library.
sess = InferenceSession('onnx_model.onnx') res = sess.run(None, input_feed={'input': X_test})
In this example, we also see notable performance improvements using ONNX Runtime.
Batch Size | Prediction Time (ms)* | ||
ONNX Runtime | scikit-learn | % Improvement | |
1 | 0.053951 | 0.329427 | 511% |
2 | 0.037674 | 0.267644 | 610% |
4 | 0.039859 | 0.270791 | 579% |
8 | 0.04392 | 0.275846 | 528% |
16 | 0.055212 | 0.280654 | 408% |
32 | 0.086143 | 0.32052 | 272% |
64 | 0.113393 | 0.376533 | 232% |
128 | 0.257239 | 0.481376 | 87% |
256 | 0.495993 | 0.748602 | 51% |
512 | 0.938491 | 1.144875 | 22% |
1024 | 1.537558 | 1.90596 | 24% |
2048 | 2.501111 | 3.45471 | 38% |
The full detailed Jupyter notebook can be found here.
Conclusion
These examples are just the tip of the iceberg for the applications and value of ONNX. For further reading, check out ONNX Tutorials and ONNX Runtime Tutorials for more samples. If you have any questions, please join the ONNX and ONNX Runtime communities on Github for active discussions.
Finally, we invite you to join us at ODSC East in Boston for our hands-on workshop to learn more about ONNX, ONNX Runtime, and how you can use these open-source tools to accelerate and deploy state-of-the-art ML and DNN models in production.
*Note: All the benchmarking experiments were run on Azure Notebook VM Standard_D3_V2. Python’s timeit() was used to measure prediction time-averaged over 100 executions.
Faith and Prabhat are speakers for ODSC East 2020 this April. Be sure to check out their talk, “From Research to Production: Performant Cross-platform ML/DNN Model Inferencing on Cloud and Edge with ONNX Runtime,” there!
About the authors / ODSC East speakers:
Faith Xu is a Senior Program Manager at Microsoft on the Machine Learning Platform team, focusing on frameworks and tools. She leads efforts to enable efficient and performant productization of inferencing workflows for high volume Microsoft product and services though usage of ONNX and ONNX Runtime. She is an evangelist for adoption of the open source ONNX standard with community partners to promote an open ecosystem in AI.
Prabhat Roy works as a Data and Applied Scientist at Microsoft, where he is one of the two main contributors to sklearn to onnx converter project(https://github.com/onnx/sklearn-onnx). In the past, he has contributed to ML.net, which is an open source ML library for .net developers and worked with customers on text and image classification problems.