Observability, the ability to understand a system based on its outputs, is absolutely critical to ensuring the performance of machine learning models and machine learning systems in production. Without AI observability, data scientists and machine learning engineers will be unable to debug problems with their ML systems in production, causing model performance degradation and failure. This means that the business value of these machine learning models is in jeopardy unless MLEs and MLOps Engineers put into place systems that enable them to understand what’s happening.
Monitoring machine learning models in production goes part of the way towards solving this problem, but it’s just the tip of the iceberg for observability. In fact, monitoring is a subset of observability and requires observability in order to track the right metrics and performance indicators. Monitoring is absolutely necessary to ensuring the performance of an ML system, but it’s also insufficient.
But how does one get observability? In classic software applications, metrics, logs, and traces are the pillars of observability. Without these artifacts, it’s impossible to know what’s happening with a software application. But what is the equivalent of metrics, logs, and traces for machine learning systems? Since machine learning models and the systems that host them act on and produce data (input data, predictions, and ground truths), the artifacts required for AI observability must be artifacts which describe the data in the machine learning system.
The open-source library whylogs provides a great option for creating artifacts which enable AI observability. With whylogs, users can generate data profiles, which are statistical summaries of their data. These statistical summaries capture key information about the data, such as the distribution, the number of missing values, the data type, etc. With these profiles, users are able to understand the data being fed to their model, the predictions that it’s making, and the model’s performance. This enables them to quickly diagnose problems such as data drift, data quality issues, and more.
Human in the loop vs AutoMLOps
With whylogs profiles, or other types of AI observability artifacts, users are able to gain visibility into their ML system. This enables them to understand whether problems were introduced in data pipelines upstream from the model, changes in the data coming into the system, or something else entirely.
In addition to this manual “human in the loop” observability, it’s also possible to automate a lot of MLOps best practices using an AI observability platform. By automating these processes, machine learning engineers are able to take themselves out of the equation and focus on getting more models into production, rather than constantly having to diagnose and repair issues with their deployed ML systems.
Conclusion on AI Observability
Whether with bleeding-edge AutoMLOps observability processes or industry-standard human-in-the-loop processes, machine learning engineers and data scientists need to be able to get visibility into the performance of their system. If they don’t, they will never be able to identify and diagnose issues with ML systems, let alone fix those issues. That’s why we built the open-source library whylogs and the SaaS AI observability platform WhyLabs, which enable machine learning practitioners to run AI with certainty.
To learn more about AI Observability and deep dive into the topics presented in this overview blog post, check out my upcoming talk at ODSC East, “AI Observability: How To Fix Issues With Your ML Model.”
About the Author:
Danny D. Leybzon has worn many hats, all of them related to data. He studied computational statistics at UCLA, before becoming first an analyst and then a product manager at a big data platform named Qubole. He went on to be the primary field engineer for data science and machine learning at Imply, before taking on his current role as MLOps Architect at WhyLabs. He has worked to evangelize machine learning best practices, talking on subjects such as distributed deep learning, productionizing machine learning models, automated machine learning, and lately has been talking about AI observability and data logging. When Danny’s not researching, practicing, or talking about data science, he’s usually doing one of his numerous outside hobbies: rock climbing, backcountry backpacking, skiing, etc.