Python 使用 Matplotlib 绘制 on/off 数据

Python plotting on/off data using Matplotlib

我正在尝试绘制有关一堆设备的数据,无论它们是在线还是离线。这些设备在联机时发出信号 1,在脱机时发出信号 0。中间没有数据。

对于一个设备,我使用了阶梯图(step=post),效果很好。现在我想在一个或多个设备在线时用一行显示。

有人 tips/tricks 了解如何可视化此数据集吗?我尝试在每个信号之前添加额外的行以获得更连续的数据集,然后绘制 OnOff 的值,但随后我丢失了类别。我需要将其转换为 broken_barh 图吗?或者有什么其他想法?

数据:

import pandas as pd 
import matplotlib.pyplot as plt

TESTDATA = u"""\
Index;OnOff;Device
12-10-2021 10:04:04;1;device1
12-10-2021 10:04:12;0;device3
12-10-2021 10:05:05;1;device2
12-10-2021 19:05:11;0;device2
13-10-2021 05:25:17;1;device2
13-10-2021 19:26:22;0;device2
14-10-2021 15:44:44;1;device2
14-10-2021 20:54:12;0;device2
15-10-2021 04:21:42;1;device2
15-10-2021 09:15:11;0;device2
15-10-2021 17:05:05;0;device1
15-10-2021 17:05:25;1;device3
15-10-2021 17:56:45;1;device1
15-10-2021 17:57:09;1;device2
15-10-2021 21:10:20;0;device2
16-10-2021 01:51:50;1;device2
19-10-2021 10:00:13;0;device1
19-10-2021 10:04:19;0;device2
"""

df = pd.read_csv(StringIO(TESTDATA), index_col=0, sep=';', engine='python')
df.index = pd.to_datetime(df.index, format='%d-%m-%Y %H:%M:%S')
print(df)

# plot
fig, ax = plt.subplots(figsize=[16,9])

devices = list(set(df['Device']))
devices.sort(reverse=True)

for device in devices:
    ax.plot(df.index[df['Device'] == device], df['Device'][df['Device'] == device], label=device)
plt.show()

问题出在 ax.plot 参数中。 ax.plot 需要 x 和 y,例如ax.plot(x, y) 你的 x, y 是: x - df.index[df['Device'] == device] - 这是正确的 y - df['Device'][df['Device'] == device - 这是不正确的

改变 df['Device'][df['Device'] == devicedf.loc[df['Device'] == device, 'OnOff']

df.loc 通过过滤行然后过滤列来工作:

df.loc[row_filter, column_filter]
row_filter = df['Device'] == device # give me all rows whre 'Device' column's value == device variable value
column_filter = 'OnOff' # give me just the OnOff column

您将看到的图表可能不是您想要的。

您可能想将 ax.plot 替换为 ax.step 以查看下面的内容,但数据会重叠并且不会太容易混淆:

最终的解决方案可能是绘制 3 个轴,每个设备在共享 x 轴上绘制 1 个轴:

# plot
fig, axs = plt.subplots(3,1, figsize=[16,9], sharex=True)

devices = list(set(df['Device']))
devices.sort(reverse=True)

for device_idx, device in enumerate(devices):
    axs[device_idx].step(df.index[df['Device'] == device], df.loc[df['Device'] == device, 'OnOff'] , label=device )   

Datetime 对象的行为确实很困难,因为并非所有 pandas/numpy/matplotlib 函数都接受所有版本或可能以不同方式解释它们。但是,我们可以转换datetimes into matplotlib dates,这是数值数据,让我们的生活更轻松(有点):

import pandas as pd 
import matplotlib.pyplot as plt
from matplotlib.dates import date2num

#test data generation
from io import StringIO
TESTDATA = u"""\
Index;OnOff;Device
12-10-2021 10:04:04;1;device1
12-10-2021 10:04:12;0;device3
12-10-2021 10:05:05;1;device2
12-10-2021 19:05:11;0;device2
13-10-2021 05:25:17;1;device2
13-10-2021 19:26:22;0;device2
14-10-2021 15:44:44;1;device2
14-10-2021 20:54:12;0;device2
15-10-2021 04:21:42;1;device2
15-10-2021 09:15:11;0;device2
15-10-2021 17:05:05;0;device1
15-10-2021 17:05:25;1;device3
15-10-2021 17:56:45;1;device1
15-10-2021 17:57:09;1;device2
15-10-2021 21:10:20;0;device2
16-10-2021 01:51:50;1;device2
17-10-2021 10:00:13;0;device1
19-10-2021 10:04:19;0;device2
"""
df = pd.read_csv(StringIO(TESTDATA), index_col=0, sep=';', engine='python')
df.index = pd.to_datetime(df.index, format='%d-%m-%Y %H:%M:%S')

#not necessary if presorted but we don't want to push our luck
df = df.sort_index()

fig, ax = plt.subplots(figsize=(10, 4))

#convert dates into matplotlib format
df["Mpl_date"] = df.index.map(date2num)
#group by device
gb = df.groupby("Device", sort="False")    

for pos, (_device_name, device_df) in enumerate(gb):  
    #make sure the entire datetime range is covered for each device
    prepend_df = (None, df.iloc[0].to_frame().T)[int((device_df.iloc[0] != df.iloc[0]).any())]
    append_df = (None, df.iloc[-1].to_frame().T)[int((device_df.iloc[-1] != df.iloc[-1]).any())]
    device_df = pd.concat([prepend_df, device_df, append_df])  
    device_df["OnOff"].iloc[[0, -1]] = 1 - device_df["OnOff"].iloc[[1, -2]] 
    #calculate time differences as broken_barh expects a list of tuples (x_start, x_width)
    device_df["Mpl_diff"] = -device_df["Mpl_date"].diff(-1)

    #and plot each broken barh, starting with the first status 1
    ax.broken_barh(list(zip(device_df["Mpl_date"], device_df["Mpl_diff"]))[1- device_df["OnOff"].iloc[0]::2], (pos-0.1, 0.2))

ax.set_yticks(range(gb.ngroups))
ax.set_yticklabels(gb.groups.keys()) 
ax.xaxis_date()
ax.set_xlabel("Date")
plt.tight_layout()       
plt.show()

示例输出:

大部分代码只需要处理原始数据帧未明确声明开始或结束状态的情况。然而,状态为 1 和 0 使编码更容易,因为它可以直接转换为索引。

P.S。设备 3 的第一条在原始图中可见,但在此处存储的下采样图像中不可见。

# first I would assume that all devices have to start from the unknow state, instead of assuming they are off, 
# thus lets add one row at the begining
new_index_first_element = df.index[0]-pd.Timedelta(seconds=1)
new_index = [new_index_first_element] + df.index.to_list()

devices = sorted(df.Device.unique())

# lets create a new dataframe where each device will have its own column and
# each entry will track the state of each device
df2 = pd.DataFrame(index = new_index, columns=devices) 

for i_iloc in range(1,len(df2)): # i have to be able to reffer to previous row, thus I will go with iloc, instead of loc
    # first copy previous status of all devices to current row
    df2.iloc[i_iloc] = df2.iloc[i_iloc-1]
    
    # now lets update the status for device that changed
    current_row_idx = df2.iloc[[i_iloc]].index
    device_to_update = df.loc[current_row_idx, 'Device']
    status_to_update = df.loc[current_row_idx, 'OnOff']
    df2.at[current_row_idx, device_to_update] = status_to_update

df2

这就是 DF 的样子,它有一个带有 NaN 的附加行,因为我们不知道这些设备的状态。

# and plot
fig, ax = plt.subplots(figsize=[16,9])
df2.plot(kind='bar', stacked=True, color=['red', 'skyblue', 'green'], ax=ax)

我不认为策划 'broken_barh plot' 会在这里做得很好, 这个堆叠的条形图会更好。