Trustworthy AI: Operationalizing AI Models with Governance – Part 2
Blogs from ODSC SpeakersConferencesOpinionWest 2021posted by ODSC Community September 13, 2021 ODSC Community
Editor’s note: Sourav Mazumder is a speaker for ODSC West 2021. Be sure to check out his talk, “Operationalization of Models Developed and Deployed in Heterogeneous Platforms,” for more info on trustworthy AI there. Read PART 1 of the series here.
AI solutions can be made out of many things like data, data products (e.g. KPIs, features of machine learning models, insights derived out of data, etc.), and models (e.g. machine learning models, decision optimization models, etc). In the rest of this article, we shall particularly focus on governed operationalization of AI models which are created out of machine learning (ML) algorithms. However, the same principles could be also applied to the other components of AI solutions. We shall refer to ML models for AI solutions as AI models for the sake of brevity.
Governed Operationalization of AI Models
Governed operationalization of AI models is a framework that uses process, people, and technology that helps in ensuring the trustworthiness of AI solutions used for business. The approach uses data and AI technologies that are integrated with an open and diverse ecosystem and is rooted in principles of trustworthy AI ethics. Governed operationalization of AI models encompasses the entire lifecycle of ML Models, starting from inception to decommission. The diagram below captures the process and people aspects of the same.
Model Inception and Candidacy Establishment – Before starting with the development of a trustworthy AI model, it is important to think about why it is needed, what should be used to develop it, and how it would be used. Questions that need to be asked and answered are: Which business process(es) are going to use it (or what problems it is going to solve); what kind of data is going to be used; what algorithm (or type of algorithms) would be most suitable to build the model; what kind of s/w tool/library/packages should be used; what kind of accuracy, fairness, drift threshold need to be maintained for the model to be used in production (otherwise re-training would be needed); what kind of interpretability the model has to be tested before the model is put in production; etc.
Data and Package Sourcing – Model development needs data and appropriate software tools/libraries/packages. Data sourcing needs governance as appropriate for the organization/use case (like data masking, subsetting of datasets, impersonation, algorithm-specific anonymization) so that no sensitive information of the customer or internal data was not used for the model development. Along with that, understanding the quality of the data used is important too which can be ensured through data profiling and lineage. Similarly, packages/ libraries used for model development should also go through some level of governance so that only trustworthy (stable, proven, audited) s/w libraries are used to develop the model.
Model Development – The model development phase typically needs data scientists to try out multiple algorithms, features, and hyperparameters to eventually arrive at the model with the right balance of the model’s performance criteria. They have to perform multiple experiments for that. From a governance perspective, what is important is to record all of those experiments, their results, and the reasoning for choosing the final model. Apart from that, the reuse of features from a feature store while developing the models can also ensure better control of the model’s performance. So, the use of pre-created (and used) features can also be mandated in model development time, if possible.
Independent Validation of Models – Just as it’s needed for quality control of traditional software, AI models also need validation by an independent team, aka not part of the team that developed the model. The independent validation team should create a separate validation dataset (blind datasets) through careful test design. They should be able to test the model for various aspects like accuracy, fairness, robustness (using drift), interpretability, throughput, etc. These results should be compared with the threshold of the related metrics established during the model inception phase. The model owner should not approve the model to be used in production unless the test results are within the range of the thresholds.
Model Deployment in Production – Validated models are deployed in production. Now various applications can call the model to get predictions by sending scoring requests over technology-independent standard protocols like Rest/HTTP. There could be two types of deployments – online (synchronous access) and batch (asynchronous access). The online deployment of a model needs model execution runtime to run continuously so that the model can be accessed in a synchronous manner for a single prediction request (or a small set of prediction requests, aka micro-batch). This is typically used in use cases that need prediction from the model in real-time; for example, online transaction fraud prediction, intent identification for chatbots, etc. Batch deployment of models needs an infrastructure that can spawn the runtime on-demand and stop when predictions for all batch scoring requests are generated. This mode is typically used to get predictions for a large volume of scoring requests where use cases can wait for a stipulation before the predictions are generated. For example – the daily identification of customers who have a high churn probability, risk factor predictions for loans applied every day, etc. In both cases, it is important to monitor the access of the model execution environment for governance purposes. Monitoring (or stopping) unauthorized access of the environment, errors in executions, throughput and latency of responses, model performance (accuracy, drift, fairness, etc) are must-haves. The model’s execution environment needs to be configured for all of such monitoring needs before making it accessible by the business applications/clients. Also, there is a need for improved security of the model serving environment (not anyone should be able to spawn or delete), model binary, and the security of scoring data.
Model Monitoring in Production – The model’s execution environment should be continuously monitored for the aspects we discussed above. In case of any violation (w.r.t the threshold decided at model inception stage), the people responsible for the model need to be alerted. For some models, it may be needed to stop the execution of the model in case any metric value goes below the threshold. Based on the change in a metric’s value, the model may need re-training or may sometimes need decommissioning.
Model Decommissioning – Most AI models are created to support a business use case or to solve a business problem. So, every model should have its tentative lifespan defined during the inception time. Either based on that or in the case of a sudden change in a business environment and/or business strategy there may be a need for the decommissioning of the model. From a governance standpoint, the following should be ensured – the ability for automated decommission of model execution based on the model’s expiration date or any other facts, archiving model artifacts for future reference and replay, and fast access of model facts when needed.
Model Lifecycle Workflow for Risk and Compliance – All of the above phases of a model’s lifecycle can be overlayed with a governance workflow for model risk management with necessary approval/gating steps. These steps can be manual or automated. Also, a change management workflow can be created for changing a model’s version because of re-training needed or any other purpose.
Model Lifecycle Automation – Like a CI/CD need for DevOps of software, model operationalization also needs automation for scaling. However, in this case, apart from Continuous Integration (CI) and Continuous Deployment (CD) one also needs Continuous Training (CT), Continuous Validation (CV), and Continuous Monitoring (CM). We call these the 5 Cs of MLOps. The 5 Cs are typically executed across different environments – model development, model UAT/pre-production, and model production environments.
Model Factsheet – Given the implications of the probabilistic nature of AI models, it is important for the users of AI models to know everything that has gone into the development of the model, such as the data used, the process, metrics, etc. Also, it is important to present model facts in a manner that is relevant and contextual for various stakeholders. Hence, aspects to record important facts about the model in all phases of the model lifecycle should be standardized and mandated. This should be practiced irrespective of tools used to develop, deploy, or monitor models in an organization.
IBM Technology to help Operationalize AI Models with Governance
IBM Cloud Pak for Data is a platform that can help organizations operationalize AI models with governance. It supports the Governed Operationalization of AI Models even if the models are developed and/or deployed in heterogeneous platforms (IBM, non-IBM, Open Source, etc). IBM Cloud Pak for Data can run on any cloud platform, such as IBM Cloud, AWS, Azure, Google Cloud, or in any on-premise infrastructure.
The diagram below showcases various components (in Dark Blue boxes) of IBM Cloud Pak for Data that can be used for the Governed Operationalization of AI Models.
You can try various components of IBM Cloud Pak for Data for Governed Operationalization of AI Models for free here.
Try various Opensource frameworks from IBM Research for Trustworthy AI (Fairness, Explainability, and Robustness)
- https://github.com/Trusted-AI/AIX360 – Explainability
- https://github.com/Trusted-AI/AIF360 – Fairness
- https://art360.mybluemix.net/ – Robustness
Sign up for the AI Governance Beta program – https://survey.alchemer.com/s3/6411187/IBM-AI-Governance-Beta-Program
Disclaimer: All opinions expressed here on trustworthy AI are my own and not of my employer.
About the author/ODSC West 2021 Speaker on Trustworthy AI:
Sourav Mazumder is an IBM Data Scientist Thought Leader in IBM Expert Labs and The Open Group Distinguished Data Scientist. Sourav has consistently driven business innovation and values through methodologies and Technologies related to Artificial Intelligence, Data Science and Big Data transpired through his knowledge, insights, experience, and influencing skills across multiple industries including Manufacturing, Insurance, Telecom, Banking, Media, Health Care and Retail industries in USA, Europe, Australia, Japan and India.