使用 matplotlib 在同一轴上绘制两个 pandas 时间序列 - 意外行为

Question

我正在处理两个时间序列（df1 和 df2），当我尝试在同一个 x 轴上绘制不同的 y 轴时，我遇到了意外行为。

下面是代码和数据。

dates1 = ['2021-08-26', '2021-08-27', '2021-08-30', '2021-08-31',
               '2021-09-01', '2021-09-02', '2021-09-03', '2021-09-07',
               '2021-09-08', '2021-09-09', '2021-09-10', '2021-09-13',
               '2021-09-14', '2021-09-15', '2021-09-16', '2021-09-17',
               '2021-09-20', '2021-09-21', '2021-09-22', '2021-09-23',
               '2021-09-24', '2021-09-27', '2021-09-28', '2021-09-29',
               '2021-09-30', '2021-10-01', '2021-10-04', '2021-10-05',
               '2021-10-06', '2021-10-07', '2021-10-08']


dates2 = ['2021-08-29', '2021-09-05', '2021-09-12', '2021-09-19',
               '2021-09-26']

y1 = np.random.randn(len(dates1)).cumsum()
y2 = np.random.randn(len(dates2)).cumsum()

df1 = pd.DataFrame({'date':pd.to_datetime(dates1), 'y1':y1})
df1.set_index('date', inplace=True)

df2 = pd.DataFrame({'date':pd.to_datetime(dates2), 'y2':y2})
df2.set_index('date', inplace=True)

将两个数据集一起绘制时，要么我看不到图（第一个图），要么我看到 y 数据以某种我不理解的方式重新采样（第二个图）。如果我单独绘制数据，则没有问题（第三和第四个图）。

fig, axs = plt.subplots(1,4, figsize=[12,4])

df1.plot(ax=axs[0])
df2.plot(ax=axs[0], secondary_y=True)

df2.plot(ax=axs[1])
df1.plot(ax=axs[1], secondary_y=True)

df1.y1.plot(ax=axs[2])
df2.y2.plot(ax=axs[3])

plt.tight_layout()

Answer 1

pandas bug: #43972
问题是 pandas 如何处理不同日期时间跨度的 xtick。
- 目前dates2还不到一个月。正如您在带有 pandas.DataFrame.plot 的图上所见，当跨度小于一个月时，格式会有所不同。如果 dates2 跨越至少一个月，则不会出现此问题。（例如 dates2 = ['2021-08-29', '2021-09-05', '2021-09-12', '2021-09-19', '2021-09-26', '2021-09-29']）。
使用 secondary_y=True 会影响 pandas 管理刻度的方式，因为如果 secondary_y=True 被移除，axs[0] 会正确绘制。
- 我不知道为什么如果 df2 像 axs[1] 一样在第一个时 df1 会起作用，但是当 df1 时 df2 不会起作用第一。

fig, axs = plt.subplots(1, 4, figsize=[15, 6], sharey=False, sharex=False)
axs = axs.flatten()

df1.plot(ax=axs[0])
print(f'axs[0]: {axs[0].get_xticks()}')
ax4 = axs[0].twiny() 
df2.plot(ax=ax4, color='tab:orange')
print(f'ax4: {ax4.get_xticks()}')

df2.plot(ax=axs[1], color='tab:orange')
print(f'axs[1]: {axs[1].get_xticks()}')
df1.plot(ax=axs[1], secondary_y=True)
print(f'axs[1]: {axs[1].get_xticks()}')

df1.y1.plot(ax=axs[2])
print(f'axs[2]: {axs[2].get_xticks()}')

df2.y2.plot(ax=axs[3])
print(f'axs[3]: {axs[3].get_xticks()}')

plt.tight_layout()

[output]:
axs[0]: [18871. 18878. 18885. 18892. 18901. 18908.]
ax4: [2696 2697 2700]
axs[1]: [2696 2697 2700]  # after plotting df2
axs[1]: [2696 2697 2701 2702]  # after plotting df1
axs[2]: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[3]: [2696 2697 2700]

注意打印的差异 xticks，这是每个刻度在轴上的位置。

使用 matplotlib.pyplot.plot 绘图对数据帧日期时间索引的处理相同。

fig, axs = plt.subplots(2, 2, figsize=[20, 12], sharey=False, sharex=False)
axs = axs.flatten()

axs[0].plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'axs[0]: {axs[0].get_xticks()}')
ax4 = axs[0].twinx()
ax4.plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'ax4: {ax4.get_xticks()}')

axs[1].plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'axs[1]: {axs[1].get_xticks()}')
ax5 = axs[1].twinx()
ax5.plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'ax5: {ax5.get_xticks()}')

axs[2].plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'axs[2]: {axs[2].get_xticks()}')
axs[3].plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'axs[3]: {axs[3].get_xticks()}')

[output]:
axs[0]: [18871. 18878. 18885. 18892. 18901. 18908.]
ax4: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[1]: [18868. 18871. 18875. 18879. 18883. 18887. 18891. 18895.]
ax5: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[2]: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[3]: [18868. 18871. 18875. 18879. 18883. 18887. 18891. 18895.]

使用 matplotlib 在同一轴上绘制两个 pandas 时间序列 - 意外行为

Plotting two pandas time-series on the same axes with matplotlib - unexpected behavior

python

datetime

time-series

matplotlib

pandas