Libraries

For creating this chart, we will need to load the following libraries:

import pandas as pd
from plotnine import *

Dataset

Since scatter plot is a type of chart that displays values for two numerical variables for a set of data, we will load the iris dataset.

The iris dataset is a classic dataset that contains the sepal and petal length and width of 150 iris flowers of three different species: setosa, versicolor, and virginica. In our case, we can to plot the sepal length on the x-axis and the sepal width on the y-axis. You can learn more about scatter plots by reading this section of the Python Graph Gallery.

url = 'https://raw.githubusercontent.com/holtzy/The-Python-Graph-Gallery/master/static/data/iris.csv'
df = pd.read_csv(url)

Most simple scatter plot

The only difference of plotnine compared to ggplot2 is that you have to wrap your code around parenthesis. Here is the most simple scatter plot you can do with plotnine:

(
ggplot(df, aes(x='sepal_length', y='sepal_width')) +
    geom_point()
)

Custom colors

If you add the color argument inside the geom_point() function, you can change the default color of the points:

(
ggplot(df, aes(x='sepal_length', y='sepal_width')) +
    geom_point(color='blue')
)

Color by group

If you want to color the points by a specific group, you just have to add color='species' inside the aes() function:

(
ggplot(df, aes(x='sepal_length', y='sepal_width', color='species')) +
    geom_point()
)

Going further

This article explains how to create a scatter plot with plotnine.

If you want to go further, you can also learn how to custom markers in a scatter plot or how to custom theme.

Contact & Edit


👋 This document is a work by Yan Holtz. You can contribute on github, send me a feedback on twitter or subscribe to the newsletter to know when new examples are published! 🔥

This page is just a jupyter notebook, you can edit it here. Please help me making this website better 🙏!