如何绘制二维直方图中第三个变量的分布？

Question

假设您有一个三个维度的数据集，x、y 和 z，并且您想显示它们之间的关系。例如，您可以在 x 和 y 中使用散点图并借助颜色图添加有关 z 的信息：

但是这样的图可能难以阅读甚至误导，所以我想在 x 和 y 中使用二维直方图，并通过 [=] 对每个数据点进行权衡13=] 值：

然而，如上图所示，bin 值的大小现在可以比 z 中的最大值高得多，这当然是有道理的，因为 bin 值通常是总和几个 z 值。

因此，仅凭 z 值进行权衡是不够的，我还需要通过每个 bin 值中的数据点数来“规范化”每个 bin 值。但正如上图右侧所示，出于某种原因，这似乎行不通。颜色值范围不变

我哪里做错了，是否有更好的方法来做到这一点？

复制代码（大致基于this example）：

import matplotlib.pyplot as plt
import numpy as np


# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
z = np.random.uniform(-100, 0, 5000)


fig, ax = plt.subplots(figsize=(4, 3), constrained_layout=True)
data = ax.scatter(x, y, c=z, s=10)
fig.colorbar(data, ax=ax, label='z')
ax.set(xlabel='x', ylabel='y', title='scatter')
fig.show()

bins = 100
fig, axs = plt.subplots(1, 3, figsize=(10, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins, cmin=0.1)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')

_, _, _, img = axs[1].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
fig.colorbar(img, ax=axs[1])
axs[1].set(xlabel='x', ylabel='y', title='weighted')

_, _, _, img = axs[2].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
data = img.get_array().reshape((bins, bins))
hist, _, _ = np.histogram2d(x, y, bins=bins)
mask = hist > 0
data[mask] = data[mask]/hist[mask]
img.set_array(data)
img.update_scalarmappable()
fig.colorbar(img, ax=axs[2])
axs[2].set(xlabel='x', ylabel='y', title='"normalized"')
fig.show()

Answer 1

实施中的解决方案，我设法使它起作用，但我仍然不确定为什么我原来的方法不起作用。

import matplotlib.pyplot as plt
import numpy as np


# make data: correlated + noise
np.random.seed(1)
x = np.random.randn(5000)
y = 1.2 * x + np.random.randn(5000) / 3
z = np.random.uniform(-100, 0, 5000)

bins = 100
fig, axs = plt.subplots(1, 3, figsize=(10, 3), constrained_layout=True)
_, _, _, img = axs[0].hist2d(x, y, bins=bins, cmin=0.1)
fig.colorbar(img, ax=axs[0])
axs[0].set(xlabel='x', ylabel='y', title='histogram')

_, _, _, img = axs[1].hist2d(x, y, bins=bins, cmax=-0.1, weights=z)
fig.colorbar(img, ax=axs[1])
axs[1].set(xlabel='x', ylabel='y', title='weighted')

sums, xbins, ybins = np.histogram2d(x, y, bins=bins, weights=z)
counts, _, _ = np.histogram2d(x, y, bins=bins)
with np.errstate(divide='ignore', invalid='ignore'):  # suppress possible divide-by-zero warnings
    img = axs[2].pcolormesh(xbins, ybins, sums / counts)
fig.colorbar(img, ax=axs[2])
axs[2].set(xlabel='x', ylabel='y', title='"normalized"')
fig.show()

如何绘制二维直方图中第三个变量的分布？

How to plot the distribution of a third variable in a 2d histogram?

python

matplotlib

histogram