Python for Data Science

Master Data Visualization using Python

Master Data Visualization using Python

Introduction

All AI and machine learning applications rely heavily on data visualization. Various graphical representations can provide key insights into your data. In this tutorial, we'll go over some data visualization options in Python. We'll use Python's graphing libraries—matplotlib & Seaborn—to demonstrate various methods for creating various types of graphs.

After finishing this tutorial, you will understand:

  • Matplotlib Image Visualization
  • Making scatter plots, line plots, and bar plots in matplotlib & Seaborn
  • Making Histograms in Matplotlib & Seaborn,

What Is Data Visualization?

Data visualization is a branch of data analysis concerned with the visual representation of data. It graphically plots data and is an effective way of communicating data inferences.
We can get a visual summary of our data by using data visualization. The human mind processes and understands data more easily when it is presented with images, maps, and graphs. Data visualization is important in the representation of both small and large data sets, but it is especially useful when we have large data sets where it is impossible to see, let alone process and understand all of our data manually.

Matplotlib vs Seaborn

Examples of Python libraries for data visualization include Matplotlib and Seaborn. They include modules for creating various graphs. Seaborn is primarily used for statistical graphs, whereas Matplotlib is used to embed graphs into applications.

MatplotlibSeaborn

Line charts, bar graphs, and other simple graph plots are generated using it.

It is primarily used for data visualization and can perform complex tasks. 

It primarily operates on datasets and arrays.

It is applicable to entire datasets.

Seaborn is far more organized and functional than Matplotlib, and it treats the entire dataset as a single entity.

Matplotlib works well with data arrays and frames. It considers the aces and figures to be objects.

Seaborn has more themes built in and is primarily used for statistical analysis.

For Exploratory Data Analysis, Matplotlib is more customizable and works well with Pandas and Numpy

# Board Infinity

import pandas as pd


# reading the database
data = pd.read_csv("tips.csv")

# printing the top 10 rows
display(data.head(10))

Output:


Matplotlib

The original Python data visualization library is matplotlib. Despite being over a decade old, it remains the most widely used plotting library in the Python community. It was created to be similar to MATLAB, a proprietary programming language developed in the 1980s.

To install, type the below command in the terminal:

pip install matplotlib

Let's look at the most commonly used plots with Matplotlib after installing it.

Scatter Plot

Scatter plots are used to examine relationships between variables by using dots to represent the relationship. To create a scatter plot, use the matplotlib library's scatter() method.

# Board Infinity

import pandas as pd
import matplotlib.pyplot as plt


# reading the database
data = pd.read_csv("tips.csv")

# Scatter plot with day against tip
plt.scatter(data['day'], data['tip'])

# Adding Title to the Plot
plt.title("Scatter Plot")

# Setting the X and Y labels
plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()

OUTPUT:

We can make this graph more meaningful by adding colors and changing the size of the points. We can accomplish this by modifying the scatter function's c and s parameters. The colorbar() method can also be used to display the color bar.

# Board Infinity

import pandas as pd
import matplotlib.pyplot as plt


# reading the database
data = pd.read_csv("tips.csv")

# Scatter plot with day against tip
plt.scatter(data['day'], data['tip'], c=data['size'],
s=data['total_bill'])

# Adding Title to the Plot
plt.title("Scatter Plot")

# Setting the X and Y labels
plt.xlabel('Day')
plt.ylabel('Tip')

plt.colorbar()

plt.show()

OUTPUT:

Line Diagram

A line chart is used to depict a relationship between two pieces of data, X and Y, on different axes. The plot() function is used to plot it. Consider the following example.
Example:

# Board Infinity

import pandas as pd
import matplotlib.pyplot as plt


# reading the database
data = pd.read_csv("tips.csv")

# Scatter plot with day against tip
plt.plot(data['tip'])
plt.plot(data['size'])

# Adding Title to the Plot
plt.title("Scatter Plot")

# Setting the X and Y labels
plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()

OUTPUT:

Bar Graph

A Bar plot, also known as a bar chart, is a graph that depicts a category of data using rectangular bars with lengths and heights proportional to the values they represent. The bar() method can be used to create it.
Example:

# Board Infinity

import pandas as pd
import matplotlib.pyplot as plt


# reading the database
data = pd.read_csv("tips.csv")

# Bar chart with day against tip
plt.bar(data['day'], data['tip'])

plt.title("Bar Chart")

# Setting the X and Y labels
plt.xlabel('Day')
plt.ylabel('Tip')

# Adding the legends
plt.show()

OUTPUT:

Histogram

A histogram is essentially used to represent data in the form of different groups. It is a type of bar plot in which the X-axis represents bin ranges and the Y-axis represents frequency. The hist() function computes and generates a histogram. If we pass categorical data to the histogram, it will automatically compute the frequency of that data, i.e., how frequently each value occurred.
Example:

import pandas as pd
import matplotlib.pyplot as plt


# reading the database
data = pd.read_csv("tips.csv")

# histogram of total_bills
plt.hist(data['total_bill'])

plt.title("Histogram")

# Adding the legends
plt.show()

OUTPUT:


Seaborn

Seaborn is a high-level interface built on top of the Matplotlib library. It includes beautiful design styles and color palettes to help you create more visually appealing graphs.
In the terminal, type the following command to install seaborn.

pip install seaborn

Because Seaborn is built on top of Matplotlib, it can also be used with Matplotlib. It is very simple to use Matplotlib and Seaborn together. Simply call the Seaborn Plotting function as usual, and then use Matplotlib's customization function.

Note: Seaborn comes pre-loaded with datasets such as tips, iris, and so on, but for the purposes of this tutorial, we will load these datasets using Pandas.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd


# reading the database
data = pd.read_csv("tips.csv")

# draw line plot
sns.lineplot(x="sex", y="total_bill", data=data)

# setting the title using Matplotlib
plt.title('Title using Matplotlib Function')

plt.show()


OUTPUT:

Scatter Plot

The scatterplot() method is used to create a scatter plot. This is similar to Matplotlib, but it requires additional argument data.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# reading the database
data = pd.read_csv("tips.csv")

sns.scatterplot(x='day', y='tip', data=data,)
plt.show()

OUTPUT:

If you want to color each point of this plot according to the sex, you'll find it difficult to do so with Matplotlib. However, in a scatter plot, it is possible to do so using the hue argument.
Example:

import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# reading the database
data = pd.read_csv("tips.csv")

sns.scatterplot(x='day', y='tip', data=data,
hue='sex')
plt.show()

OUTPUT:

Line Plot

The lineplot() method was used to create a line plot in Seaborn. In this case, we can only pass the data argument.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd


# reading the database
data = pd.read_csv("tips.csv")

sns.lineplot(x='day', y='tip', data=data)
plt.show()

EXAMPLE :

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd


# reading the database
data = pd.read_csv("tips.csv")

# using only data attribute
sns.lineplot(data=data.drop(['total_bill'], axis=1))
plt.show()

OUTPUT:

Bar Plot

The barplot() method in Seaborn can be used to create a bar plot.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd


# reading the database
data = pd.read_csv("tips.csv")

sns.barplot(x='day',y='tip', data=data,
hue='sex')

plt.show()

OUTPUT :

Histogram

Seaborn's histogram can be plotted using the histplot() function.
Example:

# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd


# reading the database
data = pd.read_csv("tips.csv")

sns.histplot(x='total_bill', data=data, kde=True, hue='sex')

plt.show()

OUTPUT:

After looking at all of these plots, you've probably noticed that customizing plots with Seaborn is a lot easier than with Matplotlib. Because it is built on matplotlib, we can use matplotlib functions while using Seaborn.

Conclusion

In this tutorial, we plotted the tips dataset using two different Python plotting modules, Matplotlib and Seaborn. Each module displayed the plot in its own distinct manner, and each has its own set of features. Matplotlib provides more flexibility, but at the expense of writing more code, whereas Seaborn, as a high-level language, allows one to achieve the same goal with less code. Depending on the task at hand, each module can be used.