A boxplot summarizes the distribution of a numeric variable for one or several groups. It allows to quickly get the median, quartiles and outliers but also hides the dataset individual data points. In python, boxplots are most of time done thanks to the
boxplot function of the
⏱ Quick start
Seaborn is definitely the best library to quickly build a boxplot. It offers a dedicated
boxplot() function that roughly works as follows:🔥
# library & dataset import seaborn as sns df = sns.load_dataset('iris') sns.boxplot( x=df["species"], y=df["sepal_length"] )
⚠️ Mind the boxplot
A boxplot is an awesome way to summarize the distribution of a variable. However it hides the real distribution and the sample size. Check the 3 charts below that are based on the exact same dataset.
To read more about this, visit data-to-viz.com that has a dedicated article.
Seaborn is a python library allowing to make better charts easily. The
boxplot function should get you started in minutes. The examples below aim at showcasing the various possibilities this function offers.
Everything you need concerning color customization on your boxplot: transparency, palette in use, manual control..
Since individual data points are hidden, it is a good practice to show the sample size under each box
violin function parameters→ see full doc
string→ color under the curve