In this post, we elaborate on how we measured, on commodity cloud hardware, the throughput and latency of five ResNet-50 v1 models optimized for CPU inference. By the end of the post, you should be able reproduce these benchmarks using tools available in the Neural Magic GitHub... Read more
In this post, we elaborate on how we used state-of-the-art pruning and quantization techniques to improve the performance of the YOLOv3 on CPUs. We’ll show that by leveraging the robust YOLO training framework from Ultralytics with SparseML’s sparsification recipes it is easy to create highly pruned and INT8 quantized... Read more