All AI and machine learning applications rely heavily on data visualization. Various graphical representations can provide key insights into your data. In this tutorial, we'll go over some data visualization options in Python. We'll use Python's graphing libraries—matplotlib & Seaborn—to demonstrate various methods for creating various types of graphs.
After finishing this tutorial, you will understand:
- Matplotlib Image Visualization
- Making scatter plots, line plots, and bar plots in matplotlib & Seaborn
- Making Histograms in Matplotlib & Seaborn,
What Is Data Visualization?
Data visualization is a branch of data analysis concerned with the visual representation of data. It graphically plots data and is an effective way of communicating data inferences.
We can get a visual summary of our data by using data visualization. The human mind processes and understands data more easily when it is presented with images, maps, and graphs. Data visualization is important in the representation of both small and large data sets, but it is especially useful when we have large data sets where it is impossible to see, let alone process and understand all of our data manually.
Matplotlib vs Seaborn
Examples of Python libraries for data visualization include Matplotlib and Seaborn. They include modules for creating various graphs. Seaborn is primarily used for statistical graphs, whereas Matplotlib is used to embed graphs into applications.
The original Python data visualization library is matplotlib. Despite being over a decade old, it remains the most widely used plotting library in the Python community. It was created to be similar to MATLAB, a proprietary programming language developed in the 1980s.
To install, type the below command in the terminal:
pip install matplotlib
Let's look at the most commonly used plots with Matplotlib after installing it.
Scatter plots are used to examine relationships between variables by using dots to represent the relationship. To create a scatter plot, use the matplotlib library's scatter() method.
We can make this graph more meaningful by adding colors and changing the size of the points. We can accomplish this by modifying the scatter function's c and s parameters. The colorbar() method can also be used to display the color bar.
A line chart is used to depict a relationship between two pieces of data, X and Y, on different axes. The plot() function is used to plot it. Consider the following example.
A Bar plot, also known as a bar chart, is a graph that depicts a category of data using rectangular bars with lengths and heights proportional to the values they represent. The bar() method can be used to create it.
A histogram is essentially used to represent data in the form of different groups. It is a type of bar plot in which the X-axis represents bin ranges and the Y-axis represents frequency. The hist() function computes and generates a histogram. If we pass categorical data to the histogram, it will automatically compute the frequency of that data, i.e., how frequently each value occurred.
Seaborn is a high-level interface built on top of the Matplotlib library. It includes beautiful design styles and color palettes to help you create more visually appealing graphs.
In the terminal, type the following command to install seaborn.
pip install seaborn
Because Seaborn is built on top of Matplotlib, it can also be used with Matplotlib. It is very simple to use Matplotlib and Seaborn together. Simply call the Seaborn Plotting function as usual, and then use Matplotlib's customization function.
Note: Seaborn comes pre-loaded with datasets such as tips, iris, and so on, but for the purposes of this tutorial, we will load these datasets using Pandas.
The scatterplot() method is used to create a scatter plot. This is similar to Matplotlib, but it requires additional argument data.
If you want to color each point of this plot according to the sex, you'll find it difficult to do so with Matplotlib. However, in a scatter plot, it is possible to do so using the hue argument.
The lineplot() method was used to create a line plot in Seaborn. In this case, we can only pass the data argument.
The barplot() method in Seaborn can be used to create a bar plot.
Seaborn's histogram can be plotted using the histplot() function.
After looking at all of these plots, you've probably noticed that customizing plots with Seaborn is a lot easier than with Matplotlib. Because it is built on matplotlib, we can use matplotlib functions while using Seaborn.
In this tutorial, we plotted the tips dataset using two different Python plotting modules, Matplotlib and Seaborn. Each module displayed the plot in its own distinct manner, and each has its own set of features. Matplotlib provides more flexibility, but at the expense of writing more code, whereas Seaborn, as a high-level language, allows one to achieve the same goal with less code. Depending on the task at hand, each module can be used.