A scatter plot displays the relationship between 2 numeric variables. Each data point is represented as a circle. Several tools allow to build one in python, this section provides code samples for
Plotly for interactive versions.
⏱ Quick start
regplot() function of the
Seaborn library is definitely the best way to build a scatterplot in minutes. 🔥
Simply pass a numeric column of a data frame to both the
y variable and the function will handle the rest.
# library & dataset import seaborn as sns df = sns.load_dataset('iris') # use the function regplot to make a scatterplot sns.regplot(x=df["sepal_length"], y=df["sepal_width"])
⚠️ Scatterplot and overplotting
The main danger with scatterplots is overplotting. When the sample size gets big, circles tend to overlap, making the figure unreadable.
Several workarounds exist to fix the issue, like using opacity or switching to another chart type:
Seaborn is a python library allowing to make better charts easily. The
boxplot function should get you started in minutes. The examples below aim at showcasing the various possibilities this function offers.
💡 Other charts involving scatterplots and
If you are interested in scatterplots, some other chart could be useful to you.
A scatterplot with marginal distribution allows to check the distribution of both the
y variables. A correlogram allows to check the relationship between each pair of numeric variables in a dataset.
violin function parameters→ see full doc
string→ color under the curve
Matplotlib is another great alternative to build scatterplots with python. As often, it takes a bit more lines of code to get a decent chart, but allows more customization.