2D histograms are useful when you need to analyse the relationship between 2 numerical variables that have a huge number of values. It is useful for avoiding the over-plotted scatterplots. The following example illustrates the importance of the bins argument. You can explicitly tell how many bins you want for the X and the Y axis. The parameters of hist2d()
function used in the example are:
x, y
: input valuesbins
: the number of bins in each dimensioncmap
: colormap
# libraries
import matplotlib.pyplot as plt
import numpy as np
# create data
x = np.random.normal(size=50000)
y = x * 3 + np.random.normal(size=50000)
# Big bins
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.jet)
plt.show()
# Small bins
plt.hist2d(x, y, bins=(300, 300), cmap=plt.cm.jet)
plt.show()
# If you do not set the same values for X and Y, the bins won't be a square!
plt.hist2d(x, y, bins=(300, 30), cmap=plt.cm.jet)
plt.show()
Once you decide the bin size, it is possible to change the colour palette. Please visit the matplotlib reference page to see the available palette.
# Reds
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.Reds)
plt.show()
# BuPu
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.BuPu)
plt.show()
Finally, it might be useful to add a color bar on the side as a legend. You can add a color bar using colorbar()
function.
plt.hist2d(x, y, bins=(50, 50), cmap=plt.cm.Greys)
plt.colorbar()
plt.show()