You generate a huge amount of data on a daily basis. A critical part of data analysis is visualization. A variety of graphing tools have developed over the past few years. Given the popularity of Python as a language for data analysis, this tutorial focuses on creating graphs using a popular Python library — Matplotlib.
Matplotlib is a huge library, which can be a bit overwhelming for a beginner — even if one is fairly comfortable with Python. While it is easy to generate a plot using a few lines of code, it may be difficult to comprehend what actually goes on in the back-end of this library. This tutorial explains the core concepts of Matplotlib so that one can explore its full potential.
Let’s get started!
The library that we will use in this tutorial to create graphs is Python’s
matplotlib. This post assumes you are using version
3.0.3. To install it, run the following
pip command in the terminal.
pip install matplotlib==3.0.3
To verify the version of the library that you have installed, run the following commands in the Python interpreter.
>>> import matplotlib >>> print(matplotlib.__version__) '3.0.3'
If you are using Jupyter notebooks, you can display Matplotlib graphs inline using the following magic command.
Pyplot and Pylab: A Note
During the initial phases of its development, Mathworks’ MATLAB influenced John Hunter, the creator of Matplotlib. There is one key difference between the use of commands in MATLAB and Python. In MATLAB, all functions are available at the top level. Essentially, if you imported everthing from
matplotlib.pylab, functions such as
plot() would be available to use.
This feature was convenient for those who were accustomed to MATLAB. In Python, though, this could potentially create a conflict with other functions.
Therefore, it is a good practice to use the
from matplotlib import pyplot as plt
All functions such as
plot() are available within
pyplot. You can use the same
plot() function using
plt.plot() after the import earlier.
Dissecting a Matplotlib Plot
The Matplotlib documentation describes the anatomy of a plot, which is essential in building an understanding of various features of the library.
The major parts of a Matplotlib plot are as follows:
- Figure: The container of the full plot and its parts
- Title: The title of the plot
- Axes: The X and Y axis (some plots may have a third axis too!)
- Legend: Contains the labels of each plot
Each element of a plot can be manipulated in Matplotlib’s, as we will see later.
Without further delay, let’s create our first plot!
Create a Plot
Creating a plot is not a difficult task. First, import the
pyplot module. Although there is no convention, it is generally imported as a shorter form &mdash
plt. Use the
.plot() method and provide a list of numbers to create a plot. Then, use the
.show() method to display the plot.
from matplotlib import pyplot as plt plt.plot([0,1,2,3,4]) plt.show()
Notice that Matplotlib creates a line plot by default. The numbers provided to the
.plot() method are interpreted as the y-values to create the plot. Here is the documentation of the
.plot() method for you to further explore.
Now that you have successfully created your first plot, let us explore various ways to customize your plots in Matplotlib.
Let us discuss the most popular customizations in your Matplotlib plot. Each of the options discussed here are methods of
pyplot that you can invoke to set the parameters.
title: Sets the title of the chart, which is passed as an argument.
ylabel: Sets the label of the Y axis.
xlabelcan be used to set the label of the X axis.
yticks: Sets which ticks to show on the Y axis.
xticksis the corresponding option for showing ticks on the X axis.
legend: Displays the legend on the plot. The
locargument of the
.legend()method sets the position of the legend on the graph. The
bestoption for the
locarguments lets Matplotlib decide the least intrusive position of the legend on the figure.
Let us use these options in our plot.
plt.plot([0,1,2,3,4], label='y = x') plt.title('Y = X Straight Line') plt.ylabel('Y Axis') plt.yticks([1,2,3,4]) plt.legend(loc = 'best') plt.show()
Here is the output of the code above. Notice that a title has appeared in the figure, the Y axis is labelled, the number of ticks on the Y axis are lesser than those in the X axis and a legend is shown on the top left corner.
After tinkering with the basic options of a plot, let’s create multiple plots in same figure. Let us try to create two straight lines in our plot.
To achieve this, use the
.plot() method twice with different data sets. You can set the label for each line plot using the
label argument of the
.plot() method to make the code shorter.
plt.plot([0,1,2,3,4], label='y = x') plt.plot([0,2,4,6,8], label='y = 2x') plt.title('Two Straight Lines') plt.legend(loc = 'best') plt.show()
Next, let’s try to create a different type of plot. To create a scatter plot of points on the XY plane, use the
plt.scatter([1,2,3,4], [5,1,4,2]) plt.show()
Here is what the scatter plot looks like.
A number of other plots can be created on Matplotlib. You can use the
.hist() method to create a histogram. You can add multiple plots to a figure using the
.subplot() method. You can even create a vector path using the
path module of
Export Plots with Matplotlib
After exploring various options while creating plots with Matplotlib, the next step is to export the plots that you have created. To save a figure as an image, you can use the
.savefig() method. The filename with the filepath should be provided as an argument to this method.
While the documentation for
savefig lists various arguments, the two most important ones are listed below:
dpi: This argument is used to set the resolution of the resulting image in DPI (dots per inch).
transparent: if set to True, the background of the figure is transparent.
While the code above saves a single figure, you may need to save multiple figures in a same file. Matplotlib allows you to save multiple figures to a single PDF file using the
PdfPages class. The steps to create a PDF file with multiple plots are listed below:
- First, import the
matplotlib.backends.backend_pdfand initialize it to an empty PDF file.
- Initialize a figure object using the
.figure()class and create the plot. Once the plot is created, use the
.savefig()method of the
PdfPagesclass to save the figure.
- Once all figures have been added, close the PDF file using the
To summarize the process, the following code snippet creates a PDF with the two figures that we created above.
from matplotlib.backends.backend_pdf import PdfPages pdf = PdfPages('multipage.pdf') fig1 = plt.figure() plt.plot([0,1,2,3,4]) plt.close() pdf.savefig(fig1) fig2 = plt.figure() plt.plot([0,2,4,6,8]) plt.close() pdf.savefig(fig2) pdf.close()
In this tutorial, we created plots in Python with the
matplotlib library. We discussed the concepts you need to know to understand how Matplotlib works, and set about creating and customizing real plots. And we showed you how to export your plots for use in real-world scenarios, like reports and presentations.
How do you create plots with Python? Let us know in the comments below.