如何在 matplotlib 中的密度绘图中将标签放入数据框中

Question

#dataframe
a=
timestamp      count
2021-08-16     20
2021-08-17     60
2021-08-18     35
2021-08-19      1
2021-08-20      0
2021-08-21      1
2021-08-22     50
2021-08-23     36
2021-08-24     68
2021-08-25    125
2021-08-26     54


I applied this code
a.plot(kind="density")

这不是我想要的。

我想用密度绘图将 Count 放在 Y 轴上，将 timestamp 放在 X 轴上。

就像我可以用 plt.bar(a['timestamp'],a['count'])

一样

或者密度图无法做到这一点？

Answer 1

以下代码创建密度直方图。总面积总和为 1，假设每个时间戳记为 1 个单位。为了得到 x-axis 的时间戳，它们被设置为索引。为了使总面积总和为 1，所有计数值除以它们的总和。

根据相同的数据计算出一个 kde。

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde
from io import StringIO

a_str = '''timestamp      count
2021-08-16     20
2021-08-17     60
2021-08-18     35
2021-08-19      1
2021-08-20      0
2021-08-21      1
2021-08-22     50
2021-08-23     36
2021-08-24     68
2021-08-25    125
2021-08-26     54'''
a = pd.read_csv(StringIO(a_str), delim_whitespace=True)

ax = (a.set_index('timestamp') / a['count'].sum()).plot.bar(width=0.9, rot=0, figsize=(12, 5))

kde = gaussian_kde(np.arange(len(a)), bw_method=0.2, weights=a['count'])

xs = np.linspace(-1, len(a), 200)
ax.plot(xs, kde(xs), lw=2, color='crimson', label='kde')
ax.set_xlim(xs[0], xs[-1])
ax.legend(labels=['kde', 'density histogram'])
ax.set_xlabel('')
ax.set_ylabel('density')
plt.tight_layout()
plt.show()

如果只想绘制kde曲线，可以省略直方图。您可以选择填充曲线下的区域。

fig, ax = plt.subplots(figsize=(12, 5))

kde = gaussian_kde(np.arange(len(a)), bw_method=0.2, weights=a['count'])

xs = np.linspace(-1, len(a), 200)
# plot the kde curve
ax.plot(xs, kde(xs), lw=2, color='crimson', label='kernel density estimation')
# optionally fill the area below the curve
ax.fill_between(xs, kde(xs), color='crimson', alpha=0.2)
ax.set_xticks(np.arange(len(a)))
ax.set_xticklabels(a['timestamp'])
ax.set_xlim(xs[0], xs[-1])
ax.set_ylim(ymin=0)
ax.legend()
ax.set_xlabel('')
ax.set_ylabel('density')
plt.tight_layout()
plt.show()

要绘制多条相似曲线，例如使用更多 count 列，您可以使用循环。可以从 Set2 颜色图中获得一组搭配得很好的颜色：

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
from scipy.stats import gaussian_kde

a = pd.DataFrame({'timestamp': ['2021-08-16', '2021-08-17', '2021-08-18', '2021-08-19', '2021-08-20', '2021-08-21',
                                '2021-08-22', '2021-08-23', '2021-08-24', '2021-08-25', '2021-08-26']})
for i in range(1, 5):
    a[f'count{i}'] = (np.random.uniform(0, 12, len(a)) ** 2).astype(int)

xs = np.linspace(-1, len(a), 200)
fig, ax = plt.subplots(figsize=(12, 4))
for column, color in zip(a.columns[1:], plt.cm.Set2.colors):
    kde = gaussian_kde(np.arange(len(a)), bw_method=0.2, weights=a[column])
    ax.plot(xs, kde(xs), lw=2, color=color, label=f"kde of '{column}'")
    ax.fill_between(xs, kde(xs), color=color, alpha=0.2)
    ax.set_xlim(xs[0], xs[-1])
ax.set_xticks(np.arange(len(a)))
ax.set_xticklabels(a['timestamp'])
ax.set_xlim(xs[0], xs[-1])
ax.set_ylim(ymin=0)
ax.legend()
ax.set_xlabel('Date')
ax.set_ylabel('Density of Counts')
plt.tight_layout()
plt.show()

如何在 matplotlib 中的密度绘图中将标签放入数据框中

how to put label in dataframe in Density plotting in matplotlib

python

plot

matplotlib