Data visualization using seaborn – Part 1

Data visualization tools help us in understanding trends, outliers, and patterns in data. Graphs help us to make stories from data.
This tutorial will discuss how to create, visualize and get understanding of graphs generated using Python’s seaborn package with help of different examples.

A picture is worth a thousand words

Seaborn package :

Seaborn package provides an application program interface on top of Matplotlib. It provides high-level functions for statistical plot types, and integrates with Pandas as a result produces graphs with attractive graphics.

Functionality that seaborn offers:

  • Relationships between features of data set.
  • Observations and statistics for using categorical variables.
  • Univariate(one variate) or bivariate(two variate) distributions and to compare them between subsets of data
  • Plotting of linear regression models for different kind’s dependent

Dependencies

  • Python 3.6+

Mandatory dependencies

  • numpy ( version >= 1.13.3)
  • scipy ( version >= 1.0.1)
  • pandas ( version >= 0.22.0)
  • matplotlib ( version >= 2.1.2)

Recommended dependencies

  • statsmodel (version >= 0.8.0)

Installing and getting started

  • To install seaborn, you can use pip
pip install seaborn
  • To install using conda
conda install seaborn
  • If you wanted to install in Jupiter notebook or want to install in kaggle kernels
!pip install seaborn

Lets discover what happens!

  • To import seaborn –
import seaborn as sns

It contacts and seeks help from matplotlib to draw the graphs. Many functions can be done with seaborn itself but for customization purposes it approaches matplotlib. We have to use matplotlib.pyplot.show () to display the graph.

  • To choose the default seaborn theme, scaling and color –
sns.set()

It uses the matplotlib customizing system and will affect how all matplotlib plots look, even if you don’t make them with seaborn library.

  • To view the pre loaded data –
sns.get_dataset_names() 

To list of data sets available, we use this above code

  • To load one of the example datasets –
sns.load_dataset('datasetname')

We will use this example datasets for examples

Note : The latest version of seaborn was released in April 2020 ( v0.10.1 )

Features of seaborn

  • Relational
  • Categorical
  • Distribution
  • Regression
  • Multiples
  • Style
  • Color

We will see relational graphs in this part

Relational Plots – Visualizing statistical relationships

To understand how columns ( features ) in a dataset relate to each other and how those relationships depend on other variables we use these type of graphs.

There are two types of relational plot

  1. Scatter plot ( when kind = “scatter” )
  2. Line plot ( when kind = “line” )

Correlation between features can be represented for different subsets of the data using the hue, size, and style parameters while 3rd dimensional parameter can be included by adding hue parameter.

Note – Unlike using plotting functions, data is provided in the form of data frame with columns specified by passing strings to x, y, and other parameters.

The syntax and parameters of relational plot is given which returns Facet grid object with a plot as a result –

seaborn.relplot(x=None, y=None, hue=None, size=None, style=None, data=None, row=None, col=None, col_wrap=None, row_order=None, col_order=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=None, dashes=None, style_order=None, legend='brief', kind='scatter', height=5, aspect=1, facet_kws=None, **kwargs)

The below table shows the parameters that can be given, input types and their need.

ParametersInput type
x,yNumeric
hueColumn name, optional
sizeColumn name, optional
styleColumn name, optional
dataData Frame
row, colVariable Name, optional
row_order, col_orderlists of strings, optional
palettepalette name, list/dict, opt
hue_orderlist, optional
hue_normtuple or Normalize obj
sizeslist, dict, or tuple.
legend“brief”, “full”, or False, optional

kind
string, optional –
Heightscalar, optional
Aspect scalar, optional

Lets see some examples for better understanding.

Example 1-

This is an example of graph showing relationship between two numerical features on top of a categorical feature

Output

Example 2-

This example shows how two numerical and two categorical variables are related with added columns

Output

Example 3-

Facet on the columns and rows are added

Output

Example 4-

Using multiple semantic variables on each facet with specific features

Output

Example 5-

This time series plot is obtained by changing the kind to line

Note – You can change the height and aspect ratio by specifying a numerical value

Output

Plotting with categorical data

Above we saw visual representations to show the relationship between multiple variables in a dataset having numerical values, so what about categorical variables in seaborn ? There are several different ways to visualize it. Let’s jump into it

Introduction

If variables are categorical, the levels of the categorical variables and their order can be visualized. Otherwise you can use it by altering dataframe by sorting or use the function parameters (orient, order, hue_order, etc.) to set up the plot correctly. catplot() helps us to plot

Different type of catplot are

Categorical scatter plots:

  • stripplot(with kind=”strip”; the default)
  • swarmplot()(with kind=”swarm”)

Categorical distribution plots:

  • boxplot()(with kind=”box”)
  • violinplot()(with kind=”violin”)
  • boxenplot()(with kind=”boxen”)

Categorical estimate plots:

  • pointplot()(with kind=”point”)
  • barplot()(with kind=”bar”)
  • countplot() (with kind=”count”)

The syntax and parameters of cat plot is given which returns Facet grid object with a plot as a result –

seaborn.catplot(x=None, y=None, hue=None, data=None, row=None, col=None, col_wrap=None, estimator=<function mean at 0x105c7d9e0>, ci=95, n_boot=1000, units=None, seed=None, order=None, hue_order=None, row_order=None, col_order=None, kind='strip', height=5, aspect=1, orient=None, color=None, palette=None, legend=True, legend_out=True, sharex=True, sharey=True, margin_titles = False, facet_kws=None, **kwargs)

The below table shows the parameters that can be given, input types and their need.

ParametersInput type
x, y, hueCategorical column
dataData frame
row, colnames of variables in data, optional
col_wrapint, optional
estimatorscalar, optional
cifloat
n_bootint, optional
unitsColumns of data frame, optional
seedint,numpy.random.Generator, optional
kindstring, optional
heightscalar, optional
Aspect scalar, optional
orientmatplotlib color, optional
palettepalette name, list, or dict, optional
legendbool, optional
margin_titlesbool , optional
share{ x , y }bool, ‘col’, or ‘row’ optional

Lets discuss few examples

Example 1-

Output

Example 2-

Below is a violin plot to visualize the distribution of data

Output

Example –

With hue parameter which helps in visualizing 3rd parameter

Output

Conclusion

Seaborn has a number of interesting visualizations with graphics integrated and the code is very simple and handy. In this article, we looked at how we can plott relational and categorical plots using Seaborn library.

This is Part 1 of the series of article on Seaborn. In the second article of the series, we will see how we play around with different types of graphs and Regression plots in Seaborn. Meanwhile you can also check this post on : Getting started with numpy

Spread the knowledge

Aswath Rao

Currently pursuing Msc in Data Science

Leave a Reply

Your email address will not be published. Required fields are marked *