What is MLPerf? What is MLPerf?
AI might be a buzzword, but the hype is outpacing tools to ensure benchmarks. Up to this point, assessing the performance... What is MLPerf?

AI might be a buzzword, but the hype is outpacing tools to ensure benchmarks. Up to this point, assessing the performance of ML software was difficult. You couldn’t just measure it objectively against other types of frameworks. Now, a collection of tech companies have released MLPerf, a consistent way to measure your ML benchmarks objectively.

[Related Article: Best Practices for Deploying Machine Learning in the Enterprise]

What is MLPerf?

For companies like Google, Intel, and Baidu, an objective measurement of ML tools across the board is a necessary part of creating and maintaining new products. If you can’t figure out how well current solutions perform against other possibilities, you won’t know if your choices are the right ones. 

In 1988, the Standard Performance Evaluation Corporation’s (SPEC) benchmark debuted for general computing, and in the following years, standard computing improved 1.6 times per year. It was the launch of a standardized protocol that spurred improvements in computing performance, and the AI industry is hoping for the same results for machine learning operations.

MLPerf’s purpose is a broad approach to machine learning supported by both industry and research academia. It’s primarily used for assessing workloads with over 40 organizations coming together to decide on a consistent set of benchmarks for ML workflows.

How Does It Work?

As a business, you want to know how quickly you can train that shiny new AI model or deploy from different environments. Will your performance measure up on a smart device, for example? How will an autonomous vehicle perform within a multi-stream system?

Up to now, those benchmarks weren’t set, making it difficult to measure them consistently. MLPerf has five benchmarks created by leading institutions with stakes in the AI space:

one machine translation benchmark: WMT English-German data set

two object detection benchmarks: COCO data set

two image classification benchmarks: ImageNet data set

There are plans to add a benchmark for energy efficiency because running AI training on some of the most sophisticated equipment has a massive carbon footprint, possibly more than a standard car. 

As the system is tested and becomes more sophisticated, other benchmarks could help create a well-rounded picture of how well a particular AI system works. It allows engineers to develop and tweak aspects that matter instead of taking shots in the dark during development. It can also help decision-makers understand the benefits of specific systems over others when choosing where and how to deploy.

What will MLPerf Accomplish?

If you’re a business leader decision-maker, MLPerf helps establish the groundwork for consistent benchmarks. You’ve let your boardroom know about these new initiatives, and maybe board members were breathing down your neck to start an AI initiative in the first place. 

When you report back, you could have real data to support the frameworks you’ve chosen and allow the board to see where improvements can be made. It levels the playing field and creates salient measures for performance.

If you’re an engineer, it helps you see how your particular choices affect performance. You find out how fast training can happen and how efficiently despite the industry where you’re doing your work. It helps you understand your machine learning software and hardware’s performance, but not only that. It also helps you see how changes from large to small affect that performance. 

With these indications, you can shift your focus before, during, and after deployment to achieve the most significant results. If your CTO wants something faster, you have real results supporting your decision. If your board wants energy efficiency, you’ll soon have reliable benchmarks for that. 

Achieving Consistency with ML Products

The group is hoping consistent benchmarks spur the development of faster, more efficient AI structures, both software and hardware related. If it can accomplish what SPEC did for standard computing, we should be able to see improvements in the performance of our AI products.

These workloads require so much computing power that it’s vital for organizations to measure just how efficient systems really are. The first round of results came in towards the end of last year, giving us insight into processors in ways that could help us leverage the power of AI in the future. 

[Related Article: ML Operationalization: From What and Why? to How and Who?]

Companies trying to be judicial about their AI initiatives could get real insights into how training dollars should be spent and what the return is for that funding. In some cases, different benchmark results could mean the difference between weeks and days, providing companies with the necessary info to make data-driven decisions not just for business initiatives, but the software and hardware that processes the data itself. 

Elizabeth Wallace, ODSC

Elizabeth is a Nashville-based freelance writer with a soft spot for startups. She spent 13 years teaching language in higher ed and now helps startups and other organizations explain - clearly - what it is they do. Connect with her on LinkedIn here: https://www.linkedin.com/in/elizabethawallace/