Responsible AI: Interpret-Text
Machine LearningModelingNLP/Text AnalyticsAzureMicrosoftposted by ODSC Community June 2, 2020 ODSC Community
Artificial intelligence (AI) systems have a growing impact on people’s lives on an every-day-level, thus it is fundamental to protect people, understand models, and control ai systems. While machine learning (ML) services are constantly developing, Microsoft emphasizes the ethical principles that put people first, meaning that employees are working to ensure that AI develops in such a way that can be benefitted in society while warranting people’s trust. Interpret-Text helps in this effort by allowing data scientists to explain their models globally.
The object of the project is to bring explainability into often confusing and complex AI systems, since these can sometimes behave strangely for various, not fully understood reasons. The tools discussed in this article can help developers to debug and fully understand their models.
Furthermore, it aims to increase the understanding of intelligent systems for end-users, thus building trust and power, which can assist users to make better decisions and accept AI solutions.
Interpretability assists data scientists to explain, debug and validate their models, thus helping to build trust towards the model. InterpretML is an open-source Microsoft package that incorporates ultra-modern machine learning interpretability techniques and can be viewed as a valid source for explaining blackbox systems or glassbox models. The azureml.interpret package supports developers using dataset formats such as numpy.array, pandas.DataFrame, iml.datatypes.DenseData, scipy.sparse.csr_matrix; furthermore, leverages libraries like LIME, SHAP, SALib or Plotly and offers new interpretability algorithms like Explainable Boosting Machine (EBM).
The interpretability package can be useful for any data scientist, especially start-ups and companies as an essential tool for model debugging, detect fairness issues, understand regulatory compliance and the model’s decisions to build trust amongst stakeholders and executives.
Interpret-Text, the innovative interpretability technique for Natural Language Processing (NLP) models — that have been developed by the community — has been announced at Microsoft Build 2020.
This open-source tool allows developers and data scientists to explain their models globally (each label) or locally (each document), to build a visualization dashboard that provides insights into their data, and to perform a comparative analysis on their experiments while running them on different state-of-the-art explainers.
Use Interpret-Text with the Classical Text Explainer
The Classical Text Explainer is an interpretability technique used on classical machine learning models and covers the whole pipeline including text preprocessing, encoding, training, and hyperparameter tuning, all behind the scenes.
For pre-processing it uses a bag-of-words encoder and logistic regression for training as a default configuration. It is possible to change these in the utils_classical.py file.
(Find it in your folder you just created in the previous step:
As an input model, Classical Text Explainer supports two model families: scikit-learn linear models (coefs_ call) and tree-based models (feature_importances call). Additionally, any models with similar layout and suitability for sparse representation can be used soon.
The API enables developers to extend or move around the different modules such as the pre-processor, the tokenizer, or the model, and the explainer still can pull in and use the tools implemented in the package.
If you want to understand how this explainer works, follow this link to the implementation: Classical Text Explainer implementation
Use Interpret-Text with the Unified Information Explainer
Unified Information Explainer can be used when a unified and intelligible explanation is needed about the transformer, pooler, and classification layers of a particular deep NLP model.
Text pre-processing is handled by the explainer, sentences are tokenized by the BERT Tokenizer. At the time of writing the article, the developer must provide Unified Information Explainer a trained or fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model with samples of trained data. Support for Recurrent Neural Network (RNN) and Long short-term memory (LSTM) is also going to be implemented in the future.
Find out how to use the explainer by visiting the link below: Unified Information Explainer implementation
Use Interpret-Text with the Introspective Rationale Explainer
To generate an outstanding text fragment of important features for training a classification model, Introspective Rationale Explainer uses a generator-predictor framework. This tool predicts the labels and organizes the result, whether the words are useful (rationales) or should not be used for training (anti-rationales).
The API is designed to be modular and extensible and can be used when a BERT or an RNN model needs to be explained. If the developer wants to define a personalized model, the pre-processor, the predictor and the generator modules should be provided by the developer.
Learn more about the usage of this explainer: Introspective Rationale Explainer
Author: Eve Pardi – Microsoft AI MVP, Software Developer, Data Scientist, Speaker, Blogger