Libraries

For creating this chart, we will need to load the following libraries:

import pandas as pd
from plotnine import *

Dataset

Since histograms are a type of chart that displays the distribution of a numerical variable, we need a dataset that contains this type of values.

For instance, we will use the iris dataset, which is a famous dataset used in data science. It contains 150 observations of iris flowers. There are four columns of measurements of the flowers in centimeters. The fifth column is the species of the flower observed.

We can easily load it using the following code:

path = 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/iris.csv'
df = pd.read_csv(path)
df.head()
sepal_length sepal_width petal_length petal_width species
0 5.1 3.5 1.4 0.2 setosa
1 4.9 3.0 1.4 0.2 setosa
2 4.7 3.2 1.3 0.2 setosa
3 4.6 3.1 1.5 0.2 setosa
4 5.0 3.6 1.4 0.2 setosa

Most simple histogram

The ggplot() function works the following way: you start by initializing a plot with ggplot() and then you add layers to it using the + operator.

In this case, we will use the geom_histogram() function to create a histogram. We will map the Sepal.Length column to the x-axis.

(
ggplot(df, aes(x='sepal_length')) +
    geom_histogram(bins=8)
)

Control number of bins

You can control the number of bins in a histogram by setting the bins argument inside the geom_histogram() function.

(
ggplot(df, aes(x='sepal_length')) +
    geom_histogram(bins=15)
)

Change color and edge color

You can change the color of the bars by setting the fill argument inside the geom_histogram() function. You can also change the color of the edges by setting the color argument.

(
ggplot(df, aes(x='sepal_length')) +
    geom_histogram(bins=10, fill='lightblue', color='black')
)

Change overall appearance

To improve the style of the chart, we can change the following arguments:

  • fill: to change the color of the bars
  • color: to change the color of the borders of the bars
  • alpha: to change the transparency of the bars
  • bins: to change the number of bins
(
ggplot(df, aes(x='sepal_length')) +
    geom_histogram(bins=10, fill='lightblue', color='black', alpha=0.4) +
    theme_minimal()
)

Going further

This article explains how to create a histogram with plotnine.

If you want to go further, you can also learn how to create a multiple histogram with plotnine and have a look at the histogram section of gallery.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!