Lots of businesses want to use machine learning, but few are ready to integrate machine learning into a real-life context of operations. Dr. Mufajjul Ali, Data Solutions Architect for Microsoft, outlines how Microsoft is addressing these needs and offers some advice for businesses looking to operationalize ML models and build a culture of machine learning.
Challenges to Operationalization
Data scientists are often confined to working in isolation. They create beautiful models that no one can understand, and the models don’t usually translate to real business value. This silo is a direct challenge to a business realizing the true value of what data teams are doing.
The fuzziness of this perspective keeps businesses from understanding the end to end the journey of their data. The typical data science lifecycle is hidden and isolated from the real business value, slowing down the process and keeping the entire team from fully benefitting from the insights being created. It’s wasteful and erratic.
If a process is isolated from the enterprise, the insights won’t feed into the overall process. Businesses are looking for analytics platforms that can utilize real-time data in both unstructured and structured models along with a batch layer that’s scheduled and continually monitored. With this Platform in place, data scientists move beyond creating models to an integrative, continuous pipeline of data analysis.
Real Life Spaces
One example of this end to end pipeline looks like this.
The data sources are structured, semi-structured, and structured data. Dealing with that volume, the data factory ingests the data and stores it in block storage. Once it’s cataloged and stored, you can deploy the training process.
These processes are consistent and automated, allowing your data scientists to model efficiently and serve insight. The pipeline is first built to answer a business issue. From there, the model is serialized to deploy through a target context. Once it’s serialized, it can attack the target context consistently.
Optimizing the frameworks for different platforms can be a challenge as Microsoft found out when evolving these ecosystems for operationalization. It can cause fragmentation across your teams, the dreaded silos that prevent true collaboration within ML.
One way Microsoft addressed this issue is to build intermediate representation. These models talk to other runtime environments and CPUs, allowing teams to work better together without encountering common communication issues.
Goals Of Operationalization
The end goal of operationalization is standardization. Serialized models should be interoperable so that you aren’t tied to specific frameworks. Hardware vendors can optimize the model for the target rather than the specific environment. Microsoft is working on this standardization through ONNX.
Once you’ve mastered interoperability, the entire ecosystem becomes highly scalable. Design principles for things like ONNX are to support DNN in addition to traditional ML. Microsoft’s option is flexible enough to evolve and allow for compact, cross-platform representation. As more programs move to this type of operability, businesses will be better able to build their pipelines without worrying about environmental challenges.
Building Through ONNX
ONNX allows High-level IR through model, graph, and computation node. ONNX supports multiple data types including tensor types and non tensor types through ONNX-ML. Built-in operators are defined by name, domain, and version, allowing custom operations and experimental operators in addition to your core group.
The system standardizes functions defined by ONNX spec, but you can also perform in model customization. Big frameworks using ONNX are PyTorch and PaddlePaddle among others while converters include Mathworks and Scikit Learn.
AWS is a partner and companies include Microsoft, NVidia, and Intel. You can even use ONNX to convert Google’s TensorFlow model, giving you support for things that don’t belong to the standard.
What This Means For Business Value
The target of operationalization targets is bigger than ONNX. What Microsoft is after is serverless architecture and docker based containerization. With Serverless architecture, you aren’t going to worry about the back end infrastructure. Instead, you focus on what your business is really interested in whether it’s scale or a specific task. The back end is handled on your behalf.
With Docker-based containerization, you never worry that your systems will be irreproducible. Instead, every system is built in a self-contained system that can deploy whatever microservice you need without worrying about communication.
So what does this mean for your business? It removes the pain of manually having to compose docker and handing back end infrastructure and puts your data team back where they need to be, running programs for business insight. It’s scalable and allows different members of the team to fit into the pipeline, removing silos and effectively creating an operational culture for your data needs.
As more companies, frameworks, and environments begin standardization, your organization’s ability to build your pipeline and officially operationalize gets easier. Even when some aspects, such as Google, decide to go their own way, a standardization allows you to convert to something that communications with your programs and doesn’t stand in the way of your business building that pipeline. The result? Better, faster insight that translates to business value.