# Data visualization using seaborn – Part 1

Data visualization tools help us in understanding trends, outliers, and patterns in data. Graphs help us to make stories from data.

This tutorial will discuss how to create, visualize and get understanding of graphs generated using Python’s seaborn package with help of different examples.

A picture is worth a thousand words

## Seaborn package :

Seaborn package provides an application program interface on top of Matplotlib. It provides high-level functions for statistical plot types, and integrates with Pandas as a result produces graphs with attractive graphics.

**Functionality that seaborn offers:**

- Relationships between features of data set.
- Observations and statistics for using categorical variables.
- Univariate(one variate) or bivariate(two variate) distributions and to compare them between subsets of data
- Plotting of linear regression models for different kind’s dependent

**Dependencies**

- Python 3.6+

**Mandatory dependencies**

- numpy ( version >= 1.13.3)
- scipy ( version >= 1.0.1)
- pandas ( version >= 0.22.0)
- matplotlib ( version >= 2.1.2)

**Recommended dependencies**

- statsmodel (version >= 0.8.0)

**Installing and getting started**

- To install seaborn, you can use pip

`pip install seaborn`

- To install using conda

`conda install seaborn`

- If you wanted to install in Jupiter notebook or want to install in kaggle kernels

`!pip install seaborn`

Lets discover what happens!

- To import seaborn –

`import seaborn as sns`

It contacts and seeks help from matplotlib to draw the graphs. Many functions can be done with seaborn itself but for customization purposes it approaches matplotlib. We have to use matplotlib.pyplot.show () to display the graph.

- To choose the default seaborn theme, scaling and color –

`sns.set()`

It uses the matplotlib customizing system and will affect how all matplotlib plots look, even if you don’t make them with seaborn library.

- To view the pre loaded data –

`sns.get_dataset_names() `

To list of data sets available, we use this above code

- To load one of the example datasets –

`sns.load_dataset('datasetname')`

We will use this example datasets for examples

*Note : The latest version of seaborn was released in April 2020 ( v0.10.1 )*

*Features of seaborn *

*Features of seaborn*

- Relational
- Categorical
- Distribution
- Regression
- Multiples
- Style
- Color

We will see relational graphs in this part

**Relational Plots – ****Visualizing statistical relationships**

To understand how columns ( features ) in a dataset relate to each other and how those relationships depend on other variables we use these type of graphs.

There are two types of relational plot

**Scatter plot ( when kind = “scatter” )****Line plot ( when kind = “line” )**

Correlation between features can be represented for different subsets of the data using the hue, size, and style parameters while 3^{rd} dimensional parameter can be included by adding hue parameter.

*Note – Unlike using plotting functions, data is provided in the form of data frame with columns specified by passing strings to x, y, and other parameters.*

The syntax and parameters of relational plot is given which returns **Facet grid** object with a plot as a result –

`seaborn.relplot(x=None, y=None, hue=None, size=None, style=None, data=None, row=None, col=None, col_wrap=None, row_order=None, col_order=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, markers=None, dashes=None, style_order=None, legend='brief', kind='scatter', height=5, aspect=1, facet_kws=None, **kwargs)`

The below table shows the parameters that can be given, input types and their need.

Parameters | Input type |

x,y | Numeric |

hue | Column name, optional |

size | Column name, optional |

style | Column name, optional |

data | Data Frame |

row, col | Variable Name, optional |

row_order, col_order | lists of strings, optional |

palette | palette name, list/dict, opt |

hue_order | list, optional |

hue_norm | tuple or Normalize obj |

sizes | list, dict, or tuple. |

legend | “brief”, “full”, or False, optional |

kind | string, optional – |

Height | scalar, optional |

Aspect | scalar, optional |

Lets see some examples for better understanding.

#### Example 1-

This is an example of graph showing relationship between two numerical features on top of a categorical feature

**Output**

#### Example 2-

This example shows how two numerical and two categorical variables are related with added columns

**Output**

#### Example 3-

Facet on the columns and rows are added

**Output**

#### Example 4-

Using multiple semantic variables on each facet with specific features

**Output**

#### Example 5-

This time series plot is obtained by changing the kind to line

*Note – You can change the height and aspect ratio by specifying a numerical value*

**Output**

**Plotting with categorical data**

Above we saw visual representations to show the relationship between multiple variables in a dataset having numerical values, so what about categorical variables in seaborn ? There are several different ways to visualize it. Let’s jump into it

#### Introduction

If variables are categorical, the levels of the categorical variables and their order can be visualized. Otherwise you can use it by altering dataframe by sorting or use the function parameters (orient, order, hue_order, etc.) to set up the plot correctly. **catplot()** helps us to plot

*Different type of catplot are*

**Categorical scatter plots:**

*stripplot(with kind=”strip”; the default)**swarmplot()(with kind=”swarm”)*

**Categorical distribution plots:**

*boxplot()(with kind=”box”)**violinplot()(with kind=”violin”)**boxenplot()(with kind=”boxen”)*

**Categorical estimate plots:**

*pointplot()(with kind=”point”)**barplot()(with kind=”bar”)**countplot() (with kind=”count”)*

The syntax and parameters of cat plot is given which returns **Facet grid** object with a plot as a result –

`seaborn.catplot(x=None, y=None, hue=None, data=None, row=None, col=None, col_wrap=None, estimator=<function mean at 0x105c7d9e0>, ci=95, n_boot=1000, units=None, seed=None, order=None, hue_order=None, row_order=None, col_order=None, kind='strip', height=5, aspect=1, orient=None, color=None, palette=None, legend=True, legend_out=True, sharex=True, sharey=True, margin_titles = False, facet_kws=None, **kwargs)`

The below table shows the parameters that can be given, input types and their need.

Parameters | Input type |

x, y, hue | Categorical column |

data | Data frame |

row, col | names of variables in data, optional |

col_wrap | int, optional |

estimator | scalar, optional |

ci | float |

n_boot | int, optional |

units | Columns of data frame, optional |

seed | int,numpy.random.Generator, optional |

kind | string, optional |

height | scalar, optional |

Aspect | scalar, optional |

orient | matplotlib color, optional |

palette | palette name, list, or dict, optional |

legend | bool, optional |

margin_titles | bool , optional |

share{ x , y } | bool, ‘col’, or ‘row’ optional |

Lets discuss few examples

#### Example 1-

**Output**

#### Example 2-

Below is a violin plot to visualize the distribution of data

**Output**

### Example –

With hue parameter which helps in visualizing 3rd parameter

**Output**

**Conclusion**

Seaborn has a number of interesting visualizations with graphics integrated and the code is very simple and handy. In this article, we looked at how we can plott relational and categorical plots using Seaborn library.

This is Part 1 of the series of article on Seaborn. In the second article of the series, we will see how we play around with different types of graphs and Regression plots in Seaborn. Meanwhile you can also check this post on : Getting started with numpy