

Visualizing Decision Trees with Pybaobabdt
Data VisualizationModelingposted by ODSC Community September 2, 2022 ODSC Community

Decision trees can be visualized in multiple ways. Take, for instance, the indentation nodes where every internal and leaf node is depicted as text, while the parent-child relationship is shown by indenting the child with respect to the parent.
Indentation diagram| Image by Author
Then there is the node-link diagram. It is one of the most commonly used methods to visualize decision trees where the nodes are represented via glyphs, and parent and child nodes are connected through links.
node-link diagram| Image by Author
Icicle plots are another option for the same. In addition to displaying the relationship, these plots also help depict the node size. They derive their name from the fact that the resulting visualization looks like icicles.
An icicle plot by https://www.cs.middlebury.edu/~candrews/showcase/infovis_techniques_s16/icicle_plots/icicleplots.html | CC-BY license
While these techniques are helpful, they do not scale well especially when the size of data increases. In such situations, not only does it become difficult to visualize the data, but interpreting and understanding the tree is also a challenge. BaobabView is a library created to overcome such problems, and in this article, we’ll look at its python implementation called pybaobabdt in detail, along with examples.
A few packages to plot Decision Trees
We initiated the article by discussing the multiple ways of visualizing decision trees. It’ll also be worthwhile to look at various libraries that help plot decision trees.
Dataset
We’ll use Palmer’s Penguins dataset as a common dataset here. It is a well-known dataset and is typically a drop-in replacement for the iris dataset, and the goal is to predict the penguin species from the given features.

First five rows of the dataset | Image by Author
1. Visualization using sklearn.tree
.plot_tree
This is the default way and the most commonly used method. It is available as the default option with scikit-learn.
Visualization using sklearn.tree
.plot_tree | image by Author
The max_depth
of the tree has been limited to 3 for this example.
Visualization using sklearn.tree
.plot_tree | image by Author
2. Visualization using
dtreeviz
The dtreeviz library renders better-looking and intuitive visualizations while offering better interpretability options. The library derives its inspiration from the educational animation by R2D3; A visual introduction to machine learning.
Link to article: A better way to visualize Decision Trees with the dtreeviz library
Visualization using
dtreeviz | Image by Author
Visualization using
dtreeviz | Image by Author
3. Visualization using
TensorFlow Decision Forests (TF-DF)
The TensorFlow Decision forests is a library created for training, serving, inferencing, and interpreting these Decision Forest models. It provides a unified API for both tree-based models as well as neural networks. The TensorFlow Decision Forests have inbuilt interactive plotting methods to plot and help understand the tree structure.
Link to article: Reviewing the TensorFlow Decision Forests library
Visualization using
TensorFlow Decision Forests (TF-DF) | Image by Author
Visualization using
TensorFlow Decision Forests (TF-DF) | Image by Author
Pybaobabdt: A new member in the town: pybaobabdt
A paper titled BaobabView: Interactive construction and analysis of decision trees showcases a unique technique for visualizing decision trees. This technique is not only scalable but also enables experts to inject their domain knowledge into the construction of decision trees. The method is called BaobabView and relies on the three critical aspects of visualization, interaction, and algorithmic support.
BaobabView’s three critical aspects of visualization, interaction, and algorithmic support | Image by Author
Here is an excerpt from the paper which highlights this point concretely:
We think our tool provides a double example of a visual analytics approach. We show how a machine learning method can be enhanced using interaction and visualization; we also show how manual construction and analysis can be supported by algorithmic and automated support.
What’s in the name?
Are you wondering about the strange name? Well, the term has its roots(pun intended) in the Adansonia digitata or the African baobab due to its uncanny resemblance to the tree structure.
Ferdinand Reus from Arnhem, Holland, CC BY-SA 2.0, via Wikimedia Commons
The pybaobabdt package is a python implementation of the BaobabView. Let’s now get a little deeper into the specifics of this library starting with its installation.
Installation
The package can be installed as follows:
pip install pybaobabdt
However, there are a few requirements that need to be fulfilled:
- Python version ≥ 3.6
- PyGraphviz
- Popular python packages like sklearn, numpy, pygraphviz, matplotlib, scipy, pandas should also be installed.
Pybaobabdt in action
We’ll continue with our penguins’ dataset and build a decision tree to predict the penguin species from the given features.
from sklearn.tree import DecisionTreeClassifier y = list(df['Species']) features = list(df.columns) target = df['Species'] features.remove('Species') X = df.loc[:, features]clf = DecisionTreeClassifier().fit(X,y)
The code above initializes and trains a classification tree. Once that is done, the next task is to visualize the tree using the pybaobabdt
package, which can be accomplished in just a single line of code.
ax = pybaobabdt.drawTree(clf, size=10, dpi=300, features=features, ratio=0.8,colormap='Set1')
Visualizing decision tree classifier using Pybaobabdt package | Image by Author
There you go! You have a decision tree classifier, where every class of species is represented with a different color. In the case of a Random Forest, it is also possible to visualize individual trees. These trees can then be saved to higher resolution images for in-depth inspection.
Customizations
The pybaobabdt library also offers a bunch of customizations. I’ll showcase a few of them here:
Colormaps
pybaobabdt supports all matplotlib colormaps. We have seen how a Set1
colormap looks like, but you can choose from many different options. Here are how few of them appear when used:
Decision tree visualization with Pybaoabdt with different colormaps | Image by Author
But you are not limited to the available colormaps. You can even define one of your own. Let’s say we want to highlight just one specific class in our dataset while keeping all the others in the background. Here’s what we can do:
from matplotlib.colors import ListedColormapcolors = ["green", "gray", "gray"] colorMap = ListedColormap(colors)ax = pybaobabdt.drawTree(clf, size=10, features=features, ratio=0.8,colormap=colorMap)
Highlighting only a specific class in the decision tree | Image by Author
Ratio
The ratio option is used to set the ratio of the tree where the default value is 1. Here’s a comparison of the two ratios and how they appear on the screen.
ax = pybaobabdt.drawTree(clf, size=10, dpi=300, features=features, ratio=0.5,colormap='viridis')
How different ratios affect figure size | Image by Author
maxdepth=3
The parameter maxdepth
controls the depth of the tree. A lower number limits the tree splits and also shows the top splits. If the max_depth
of the above tree is set to 3, we’ll get a stunted tree:
ax = pybaobabdt.drawTree(clf, size=10, maxdepth = 3,features=features, ratio=1,colormap='plasma')
Adjusting the maximum depth of the tree to control the tree size | Image by Author
Saving the image
The output graph can be saved as follows:
ax.get_figure().savefig('claasifier_tree.png', format='png', dpi=300, transparent=True)
Conclusion
The pybaobabdt package offers a fresh perspective on visualizations. It includes features that have not been seen in its counterparts. The main idea is to help the users understand and interpret the tree through meaningful visualizations. This article used a straightforward example to demonstrate the library. However, it’ll be an excellent exercise to use it for much more extensive and complex datasets to see its strength in the real sense. I’ll leave that as an exercise for the readers.
Article originally posted here. Reposted with permission.