Beeswarm definition

Imagine you want to know how your friends' height is distributed.

To do this, you can use a swarm plot, which is a visual way of seeing individual data points (in this case, the height of your friends) and how they are distributed.

Circles are slightly shifted to avoid overlaps. It ends up in a neat organic shape that is visually attracting and avoids to hide information. It allows you to quickly understand the extent and distribution of data without losing any information.

You can read more about beeswarm in the dedicated section of the gallery.

Libraries

First, you need to install the following librairies:

  • seaborn is used for creating the chart witht the swarmplot() function
  • matplotlib is used for plot customization purposes
  • numpy is used to generate some data

Don't forget to install seaborn if you haven't already done so with the pip install seaborn command.

# Libraries
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np

Dataset

Since beeswarm plots are meant to represent continuous variables, let's generate a sample of 100 randomly distributed observations using numpy and its random.normal() function. Our sample is generated with a mean of 10 and a standard deviation of 5.

my_variable = np.random.normal(loc=10, scale=5, size=100)

Basic beeswarm plot

The following code displays a simple bee swarm graphic, with a title and an axis name, thanks to the swarmplot() function.

Note that circles are displayed vertically since the numeric vector is passed to the y axis.

That's it! A first beeswarm plot made with the default parameters.

# Create the swarm plot
sns.swarmplot(y=my_variable)

# Customization
plt.title('Swarm Plot of My Variable (y-axis)')  # Set the title
plt.ylabel('My variable')  # Set the label for the y-axis

plt.show() # Display the chart

Color and orientation

Modify the colors

The following code uses the color, edgecolor and linewidth arguments to modify the style of points.

  • color defines the point color
  • edgecolor defines the color of the edge color
  • linewidth defines the the edge size. The edgecolor will not appear if you don't explicit the latter argument since its default value is 0.

Use another orientation

If you want to show your variable distribution on a given axis, you just have to put x=my_variable for the x-axis or y=my_variable for the y-axis. It's that simple, allowing to switch from a horizontal to a vertical beeswarm chart.

# Create the swarm plot
sns.swarmplot(x=my_variable,
              color='red', # Point color
              edgecolor='black', # Edge color
              linewidth=0.9, # Edge size
             )
plt.title("Swarm Plot of My Variable (x-axis) with customized colors")  # Set the title
plt.xlabel("My variable")  # Set the label for the x-axis
plt.show() # Display the chart

Beeswarm with multiple groups

Dataset

First, we need to create data with 2 groups. To do this, we take the following steps:

  • Define the sample size per group. Given that we have two groups, there will be 100 people in each, for a total of 200.
  • Create the data for each group (here, we give them a different mean with loc=0 VS loc=2, in order to have sufficiently different groups)
  • Create the list containing the group name for each observation
sample_size = 100  # Define the size of the random data samples.

data_group1 = np.random.normal(loc=2, scale=2, size=sample_size) # Generate data points for 'Group 1'
data_group2 = np.random.normal(loc=5, scale=2, size=sample_size) # Generate data points for 'Group 2'
data_combined = np.concatenate([data_group1, data_group2]) # Concatenate the data to create a combined dataset

category_feature = ['Group 1'] * sample_size + ['Group 2'] * sample_size # List that indicates the category for each data point

Plot

This time, both the x and y attributes must be provided. Also, it is common to use a categorical color sheme to color groups thanks to the palette argument to color groups.

# Create swarm plots
plt.figure(figsize=(8, 6))
sns.swarmplot(x=category_feature, # Group labels
              y=data_combined, # Numeric variable
              palette='Set2', # Color set used
              hue=category_feature, # Add a legend
             )
plt.title('Swarm Plot with Group1 and Group2')
plt.xlabel('Category')
plt.ylabel('Data')
plt.show()

Going further

That's it for a quick introduction to beeswarm plot with seaborn.

Please check the beeswarm section of the gallery to see many more examples with higher levels of customization.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!