The primary data visualization library in Python is matplotlib, a project begun in the early 2000s, that was built to mimic the plotting capabilities from Matlab. Matplotlib is enormously capable of plotting most things you can imagine, and it gives its users tremendous power to control every aspect of the plotting surface.
This article is an excerpt from the book Pandas 1.x Cookbook, Second Edition by Matt Harrison and Theodore Petrou. This new updated and revised edition provides you with unique, idiomatic, and fun recipes for both fundamental and advanced data manipulation tasks with pandas. This article offers a short introduction to the most crucial aspects of matplotlib.
[Related Article: Data Visualization for Academics]
For many data scientists, the vast majority of their plotting commands will use pandas or seaborn, both rely on matplotlib to do the plotting. However, neither pandas nor seaborn offers a complete replacement for matplotlib, and occasionally you will need to use matplotlib.
One thing to be aware if you are a Jupyter user. You will want to include the:
>>> %matplotlib inline
directive in your notebook. This tells matplotlib to render plots in the notebook.
Let’s begin our introduction with a look at the anatomy of a matplotlib plot in the following figure:
Matplotlib uses a hierarchy of objects to display all of its plotting items in the output. This hierarchy is key to understanding everything about matplotlib. Note that these terms are referring to matplotlib and not pandas objects with the same (perhaps confusing) name. The Figure and Axes objects are the two main components of the hierarchy. The Figure object is at the top of the hierarchy. It is the container for everything that will be plotted. Contained within the Figure is one or more Axes object(s). The Axes is the primary object that you will interact with when using matplotlib and can be thought of as the plotting surface. The Axes contains an x-axis, a y-axis, points, lines, markers, labels, legends, and any other useful item that is plotted.
A distinction needs to be made between an Axes and an axis. They are completely separate objects. An Axes object, using matplotlib terminology, is not the plural of axis but instead, as mentioned earlier, the object that creates and controls most of the useful plotting elements. An axis refers to the x or y (or even z) axis of a plot.
All of these useful plotting elements created by an Axes object are called artists. Even the Figure and the Axes objects themselves are artists. This distinction for artists won’t be critical to this recipe but will be useful when doing more advanced matplotlib plotting and especially when reading through the documentation.
Object-oriented guide to matplotlib
Matplotlib provides two distinct interfaces for users. The stateful interface makes all of its calls with the pyplot module. This interface is called stateful because matplotlib keeps track internally of the current state of the plotting environment. Whenever a plot is created in the stateful interface, matplotlib finds the current figure or current axes and makes changes to it. This approach is fine to plot a few things quickly but can become unwieldy when dealing with multiple figures and axes.
Matplotlib also offers a stateless, or object-oriented, interface in which you explicitly use variables that reference specific plotting objects. Each variable can then be used to change some property of the plot. The object-oriented approach is explicit, and you are always aware of exactly what object is being modified.
Unfortunately, having both options can lead to lots of confusion, and matplotlib has a reputation for being difficult to learn. The documentation has examples using both approaches. In practice, I find it most useful to combine them. I use the subplots function from pyplot to create a figure and axes, and then use the methods on those objects.
If you are new to matplotlib, you might not know how to recognize the difference between each approach. With the stateful interface, all commands are functions called on the pyplot module, which is usually aliased plt. Making a line plot and adding some labels to each axis would look like this:
>>> import matplotlib.pyplot as plt >>> x = [-3, 5, 7] >>> y = [10, 2, 5] >>> fig = plt.figure(figsize=(15,3)) >>> plt.plot(x, y) >>> plt.xlim(0, 10) >>> plt.ylim(-3, 8) >>> plt.xlabel(‘X Axis’) >>> plt.ylabel(‘Y axis’) >>> plt.title(‘Line Plot’) >>> plt.suptitle(‘Figure Title’, size=20, y=1.03) >>> fig.savefig(‘c13-fig1.png’, dpi=300, bbox_inches=‘tight’)
Basic plot using Matlab-like interface
The object-oriented approach is shown as follows:
>>> from matplotlib.figure import Figure >>> from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas >>> from IPython.core.display import display >>> fig = Figure(figsize=(15, 3)) >>> FigureCanvas(fig) >>> ax = fig.add_subplot(111) >>> ax.plot(x, y) >>> ax.set_xlim(0, 10) >>> ax.set_ylim(-3, 8) >>> ax.set_xlabel(‘X axis’) >>> ax.set_ylabel(‘Y axis’) >>> ax.set_title(‘Line Plot’) >>> fig.suptitle(‘Figure Title’, size=20, y=1.03) >>> display(fig) >>> fig.savefig(‘c13-fig2.png’, dpi=300, bbox_inches=‘tight’)
Basic plot created with object-oriented interface
In practice, I combine the two approaches and my code would look like this: >>> fig, ax = plt.subplots(figsize=(15,3)) >>> ax.plot(x, y) >>> ax.set(xlim=(0, 10), ylim=(-3, 8), … xlabel=‘X axis’, ylabel=‘Y axis’, … title=‘Line Plot’) >>> fig.suptitle(‘Figure Title’, size=20, y=1.03) >>> fig.savefig(‘c13-fig3.png’, dpi=300, bbox_inches=‘tight’)
Basic plot created using call to Matlab interface to create figure and axes, then using method calls
[Related Article: 3 Things Your Boss Won’t Care About in Your Data Visualizations]
In this example, we use only two objects, the Figure, and Axes, but in general, plots can have many hundreds of objects; each one can be used to make modifications in an extremely finely-tuned manner, not easily doable with the stateful interface. For practical, easy to implement recipes for quick solutions to common problems in data using pandas, please refer to the book Pandas 1.x Cookbook, Second Edition by Matt Harrison and Theodore Petrou.
About the authors
Matt Harrison has been using Python since 2000. He runs MetaSnake, which provides corporate training for Python and Data Science. He is the author of Machine Learning Pocket Reference, the bestselling Illustrated Guide to Python 3, and Learning the Pandas Library, among other books.
Theodore Petrou is the founder of Dunder Data, a training company dedicated to helping teach the Python data science ecosystem effectively to individuals and corporations. Read his tutorials and attempt his data science challenges at the Dunder Data website.