

How to Explain Your ML Models?
Blogs from ODSC SpeakersConferencesMachine LearningModelingUncategorizedEurope 2020explainable AIposted by ODSC Community August 11, 2020 ODSC Community

Explainability in machine learning (ML) and artificial intelligence (AI) is becoming increasingly important. With the increased demand for explanations and the number of new approaches out there, it could be difficult to know where to start. In this post, we will get hands-on experience in explaining an ML model using a couple of approaches. By the end of it, you will have a good grasp of some fundamental ways in which you can explain the decisions or behavior of most machine learning models.
Types of explainability approaches
When it comes to explaining the models and/or their decisions, multiple approaches exist. One may want to explain the global (overall) model behavior or provide a local explanation (i.e. explain the decision of the model about each instance in the data). Some approaches are applied before the building of the model, others after the training (post-hoc). Some approaches explain the data, others the model. Some are purely visual, others not.
What you will need?
To follow this tutorial, you will need Python 3 and some ML knowledge as I will not explain how the model I will train works. My advice is to create and work in a virtual environment because you will need to install a few packages and you may not want them to disrupt your local conda or pip setting.
Import data and packages
We will use the diabetes data set from sklearn and train a standard random forest regressor. Refer to the sklearn documentation to know more about the data (https://scikit-learn.org/stable/datasets/index.html )
from sklearn.datasets import load_diabetes import pandas as pd import matplotlib.pyplot as plt from sklearn.ensemble import RandomForestRegressor import pycebox.ice as icebox from sklearn.tree import DecisionTreeRegressor
Import data and train a model
We import the diabetes data set, assign the target variable to a vector of dependent variables y, the rest to a matrix of features X, and train a standard random forest model.
In this tutorial, I am skipping some of the typical data science steps, such as cleaning, exploring the data, and performing the conventional train/test splitting but feel free to perform those on your own.
raw_data = load_diabetes() df = pd.DataFrame(np.c_[raw_data['data'], raw_data['target']], columns= np.append(raw_data['feature_names'], ['target'])) y = df.target X = df.drop('target', axis=1) # Train a model clf = RandomForestRegressor(random_state=42, n_estimators=50, n_jobs=-1) clf.fit(X, y)
Calculate the feature importance
We can also easily calculate and print out the feature importances after the random forest model. We see that the most important is the ‘s5’, one of the factors measuring the blood serum, followed by ‘bmi’ and ‘bp’.
# Calculate the feature importances feat_importances = pd.Series(clf.feature_importances_, index=X.columns) feat_importances.sort_values(ascending=False).head() s5 0.306499 bmi 0.276130 bp 0.086604 s6 0.074551 age 0.058708 dtype: float64
Visual explanations
The first approach I will apply is called Individual conditional expectation (ICE) plots. They are very intuitive and show you how the prediction changes as you vary the feature values. They are similar to partial dependency plots but ICE plots go one step further and reveal heterogenous effects since ICE plots display one line per instance. The below code allows you to display an ICE plot for the feature ‘bmi’ after we have trained the random forest.
# We feed in the X-matrix, the model and one feature at a time bmi_ice_df = icebox.ice(data=X, column='bmi', predict=clf.predict) # Plot the figure fig, ax = plt.subplots(figsize=(10, 10)) plt.figure(figsize=(15, 15)) icebox.ice_plot(bmi_ice_df, linewidth=.5, plot_pdp=True, pdp_kwargs={'c': 'red', 'linewidth': 5}, ax=ax) ax.set_ylabel('Predicted diabetes') ax.set_xlabel('BMI')
Figure 1. ICE plot for the ‘bmi’ and predicted diabetes
We see from the figure that there is a positive relationship between the ‘bmi’ and our target (a quantitative measure for diabetes one year after diagnosis). The thick red line in the middle is the Partial dependency plot, which shows the change in the average prediction as we vary the ‘’bmi’ feature.
We can also center the ICE plot, ensuring that the lines for all instances in the data start from the same point. This removes the level effects and makes the plot easier to read. We only need to change one argument in the code.
icebox.ice_plot(bmi_ice_df, linewidth=.5, plot_pdp=True, pdp_kwargs={'c': 'blue', 'linewidth': 5}, centered=True, ax=ax1) ax1.set_ylabel('Predicted diabetes') ax1.set_xlabel('BMI')
Figure 2. Centered ICE plot for the bmi and predicted diabetes
The result is a much easier to read plot!
Global explanations
One popular way to explain the global behavior of a black-box model is to apply the so-called global surrogate model. The idea is that we take our black-box model and create predictions using it. Then we train a transparent model (think a shallow decision tree, linear/logistic regression) on the predictions produced by the black-box model and the original features. We need to keep track of how well the surrogate model approximates the black-box model but that is often not straightforward to determine.
To keep things simple, I create predictions after our random forest regressor and train a decision tree (relatively shallow one) and visualize it. That’s it! Even if we cannot easily comprehend how the hundreds of trees in the forest look (or we don’t want to retrieve them), you can build a shallow tree after it and hopefully get an idea of how the forest works.
We start with getting the predictions after the random forest and building a shallow decision tree.
predictions = clf.predict(X) dt = DecisionTreeRegressor(random_state = 100, max_depth=3) dt.fit(X, predictions) Now we can plot and see how the tree looks like. fig, ax = plt.subplots(figsize=(20, 10)) plot_tree(dt, feature_names=list(X.columns), precision=3, filled=True, fontsize=12, impurity=True)
Figure 3. Surrogate model (in this case: decision tree)
We see that the first split is at the feature ‘s5’, followed by the ‘bmi’. If you recall, these were also the two most important features picked by the random forest model.
Lastly, make sure to calculate R-squared so that we can tell how good of an approximation the surrogate model is.
We can do that with the code below:
dt.score(X, predictions) 0.6705488147404473
In this case, the R-squared is 0,67. Whether we deem this as high or low would be very context-dependent.
Next steps
Now you have gained some momentum and applied explainability techniques after an ML model. You can take another dataset, or apply them to a real use case.
You can also join my upcoming workshop titled “Explainable ML: Application of Different Approaches’”during ODSC Europe 2020. In the workshop, we will walk through these approaches in more detail and apply other equally fascinating explainability techniques.
About the Author:
Violeta Misheva works as a data scientist who holds a PhD degree in applied econometrics from the Erasmus School of Economics. She is passionate about AI for good and is currently interested in fairness and explainability in machines. She enjoys sharing her data science knowledge with others, that’s why part-time she conducts workshops with students, has designed a course for the DataCamp, and regularly attends and presents at conferences and other events.