Data visualization using seaborn – Part 1
Data visualization tools help us in understanding trends, outliers, and patterns in data. Graphs help us to make stories from data.
This tutorial will discuss how to create, visualize and get understanding of graphs generated using Python’s seaborn package with help of different examples.
A picture is worth a thousand words
Seaborn package :
Seaborn package provides an application program interface on top of Matplotlib. It provides high-level functions for statistical plot types, and integrates with Pandas as a result produces graphs with attractive graphics.
Functionality that seaborn offers:
- Relationships between features of data set.
- Observations and statistics for using categorical variables.
- Univariate(one variate) or bivariate(two variate) distributions and to compare them between subsets of data
- Plotting of linear regression models for different kind’s dependent
Dependencies
- Python 3.6+
Mandatory dependencies
- numpy ( version >= 1.13.3)
- scipy ( version >= 1.0.1)
- pandas ( version >= 0.22.0)
- matplotlib ( version >= 2.1.2)
Recommended dependencies
- statsmodel (version >= 0.8.0)
Installing and getting started
- To install seaborn, you can use pip
pip install seaborn
- To install using conda
conda install seaborn
- If you wanted to install in Jupiter notebook or want to install in kaggle kernels
!pip install seaborn
Lets discover what happens!
- To import seaborn –
import seaborn as sns
It contacts and seeks help from matplotlib to draw the graphs. Many functions can be done with seaborn itself but for customization purposes it approaches matplotlib. We have to use matplotlib.pyplot.show () to display the graph.
- To choose the default seaborn theme, scaling and color –
sns.set()
It uses the matplotlib customizing system and will affect how all matplotlib plots look, even if you don’t make them with seaborn library.
- To view the pre loaded data –
sns.get_dataset_names()
To list of data sets available, we use this above code
- To load one of the example datasets –
sns.load_dataset('datasetname')
We will use this example datasets for examples
Note : The latest version of seaborn was released in April 2020 ( v0.10.1 )
Features of seaborn
- Relational
- Categorical
- Distribution
- Regression
- Multiples
- Style
- Color
We will see relational graphs in this part
Relational Plots – Visualizing statistical relationships
To understand how columns ( features ) in a dataset relate to each other and how those relationships depend on other variables we use these type of graphs.
There are two types of relational plot
- Scatter plot ( when kind = “scatter” )
- Line plot ( when kind = “line” )
Correlation between features can be represented for different subsets of the data using the hue, size, and style parameters while 3rd dimensional parameter can be included by adding hue parameter.
Note – Unlike using plotting functions, data is provided in the form of data frame with columns specified by passing strings to x, y, and other parameters.
The syntax and parameters of relational plot is given which returns Facet grid object with a plot as a result –
seaborn.relplot(x=None, y=None, hue=None, size=None, style=None, data=None, row=None, col=None, col_wrap=None, row_order=None, col_order=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=None, dashes=None, style_order=None, legend='brief', kind='scatter', height=5, aspect=1, facet_kws=None, **kwargs)
The below table shows the parameters that can be given, input types and their need.
Parameters | Input type |
x,y | Numeric |
hue | Column name, optional |
size | Column name, optional |
style | Column name, optional |
data | Data Frame |
row, col | Variable Name, optional |
row_order, col_order | lists of strings, optional |
palette | palette name, list/dict, opt |
hue_order | list, optional |
hue_norm | tuple or Normalize obj |
sizes | list, dict, or tuple. |
legend | “brief”, “full”, or False, optional |
kind | string, optional – |
Height | scalar, optional |
Aspect | scalar, optional |
Lets see some examples for better understanding.
Example 1-
This is an example of graph showing relationship between two numerical features on top of a categorical feature
Output

Example 2-
This example shows how two numerical and two categorical variables are related with added columns
Output

Example 3-
Facet on the columns and rows are added
Output

Example 4-
Using multiple semantic variables on each facet with specific features
Output

Example 5-
This time series plot is obtained by changing the kind to line
Note – You can change the height and aspect ratio by specifying a numerical value
Output

Plotting with categorical data
Above we saw visual representations to show the relationship between multiple variables in a dataset having numerical values, so what about categorical variables in seaborn ? There are several different ways to visualize it. Let’s jump into it
Introduction
If variables are categorical, the levels of the categorical variables and their order can be visualized. Otherwise you can use it by altering dataframe by sorting or use the function parameters (orient, order, hue_order, etc.) to set up the plot correctly. catplot() helps us to plot
Different type of catplot are
Categorical scatter plots:
- stripplot(with kind=”strip”; the default)
- swarmplot()(with kind=”swarm”)
Categorical distribution plots:
- boxplot()(with kind=”box”)
- violinplot()(with kind=”violin”)
- boxenplot()(with kind=”boxen”)
Categorical estimate plots:
- pointplot()(with kind=”point”)
- barplot()(with kind=”bar”)
- countplot() (with kind=”count”)
The syntax and parameters of cat plot is given which returns Facet grid object with a plot as a result –
seaborn.catplot(x=None, y=None, hue=None, data=None, row=None, col=None, col_wrap=None, estimator=<function mean at 0x105c7d9e0>, ci=95, n_boot=1000, units=None, seed=None, order=None, hue_order=None, row_order=None, col_order=None, kind='strip', height=5, aspect=1, orient=None, color=None, palette=None, legend=True, legend_out=True, sharex=True, sharey=True, margin_titles = False, facet_kws=None, **kwargs)
The below table shows the parameters that can be given, input types and their need.
Parameters | Input type |
x, y, hue | Categorical column |
data | Data frame |
row, col | names of variables in data, optional |
col_wrap | int, optional |
estimator | scalar, optional |
ci | float |
n_boot | int, optional |
units | Columns of data frame, optional |
seed | int,numpy.random.Generator, optional |
kind | string, optional |
height | scalar, optional |
Aspect | scalar, optional |
orient | matplotlib color, optional |
palette | palette name, list, or dict, optional |
legend | bool, optional |
margin_titles | bool , optional |
share{ x , y } | bool, ‘col’, or ‘row’ optional |
Lets discuss few examples
Example 1-
Output

Example 2-
Below is a violin plot to visualize the distribution of data
Output

Example –
With hue parameter which helps in visualizing 3rd parameter
Output

Conclusion
Seaborn has a number of interesting visualizations with graphics integrated and the code is very simple and handy. In this article, we looked at how we can plott relational and categorical plots using Seaborn library.
This is Part 1 of the series of article on Seaborn. In the second article of the series, we will see how we play around with different types of graphs and Regression plots in Seaborn. Meanwhile you can also check this post on : Getting started with numpy