如何根据分类数据的频率创建 matplotlib 曲面图
How to create a matplotlib surface plot from frequencies of categorical data
我有一个如下所示的(示例)数据框:
time event type
0 2022-01-22 10:35:00 a
1 2022-01-22 11:37:00 a
2 2022-01-22 22:22:00 b
3 2022-01-22 12:05:00 b
4 2022-01-22 10:09:00 c
5 2022-01-22 10:57:00 a
6 2022-01-22 11:36:00 c
7 2022-01-22 09:45:00 a
我想创建一个 3D 表面图来显示每小时发生的每种类型的事件数量。情节的轴应该是:
X: hour
Y: event type
Y: number of events
我希望在 X
轴上看到:9, 10, 11, 12 ,22
,在 Y
轴上:a, b, c
。至于 Z 轴,其值应反映每小时每种类型的事件数。例如。 X=10, Y=a, Z=2
我查看了文档和各种示例,但找不到答案
这需要两个任务。首先,您必须聚合数据以计算唯一对 hour
-event type
,然后根据聚合的 hour
-event type
-event count
数据创建 3D 图:
from matplotlib import pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
#test data
np.random.seed(123)
n = 10
start = pd.to_datetime("2021-04-21")
end = pd.to_datetime("2021-04-23")
n_minut = ((end - start).days + 1) * 24 * 60
date_range = pd.to_timedelta(np.random.randint(0, n_minut, n), unit="minute") + start
df = pd.DataFrame({"time": date_range, "event type": np.random.choice(list("abcde"), n)})
#count event types per hour
plot_df = df. groupby([df["time"].dt.hour, df["event type"]]).size().reset_index(name="event count")
#transcribe categorical data in column "event type" into integer values
#idx contains the list of event types according to their integer numbers
val, idx = plot_df["event type"].factorize()
plot_df["event_num"] = val
#generate evenly spaced x- and y-values
x_range = np.arange(24)
y_range = np.arange(idx.size)
#and create x-y arrays for the 3D plot
X, Y = np.meshgrid(x_range, y_range)
#and fill z-values with zeros
Z = np.zeros(X.shape)
#or the event count, if exists
Z[plot_df["event_num"], plot_df["time"]] = plot_df["event count"]
#create figure with a 3D projection axis
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(X, Y, Z)
ax.zaxis.set_major_locator(MaxNLocator(integer=True))
ax.set_yticks(y_range, idx)
ax.set_ylabel("event type")
ax.set_xlabel("time (in h)")
ax.set_zlabel("count")
plt.show()
示例输出:
但是,well-known matplotlib 有时会以正确的可见顺序绘制表面时出现问题。根据您的数据,使用散点图可能会更好:
...
X, Y = np.meshgrid(x_range, y_range)
Z = np.full(X.shape, np.nan)
Z[plot_df["event_num"], plot_df["time"]] = plot_df["event count"]
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.scatter(X, Y, Z)
ax.zaxis.set_major_locator(MaxNLocator(integer=True))
...
我有一个如下所示的(示例)数据框:
time event type
0 2022-01-22 10:35:00 a
1 2022-01-22 11:37:00 a
2 2022-01-22 22:22:00 b
3 2022-01-22 12:05:00 b
4 2022-01-22 10:09:00 c
5 2022-01-22 10:57:00 a
6 2022-01-22 11:36:00 c
7 2022-01-22 09:45:00 a
我想创建一个 3D 表面图来显示每小时发生的每种类型的事件数量。情节的轴应该是:
X: hour
Y: event type
Y: number of events
我希望在 X
轴上看到:9, 10, 11, 12 ,22
,在 Y
轴上:a, b, c
。至于 Z 轴,其值应反映每小时每种类型的事件数。例如。 X=10, Y=a, Z=2
我查看了文档和各种示例,但找不到答案
这需要两个任务。首先,您必须聚合数据以计算唯一对 hour
-event type
,然后根据聚合的 hour
-event type
-event count
数据创建 3D 图:
from matplotlib import pyplot as plt
from matplotlib.ticker import MaxNLocator
import pandas as pd
import numpy as np
#test data
np.random.seed(123)
n = 10
start = pd.to_datetime("2021-04-21")
end = pd.to_datetime("2021-04-23")
n_minut = ((end - start).days + 1) * 24 * 60
date_range = pd.to_timedelta(np.random.randint(0, n_minut, n), unit="minute") + start
df = pd.DataFrame({"time": date_range, "event type": np.random.choice(list("abcde"), n)})
#count event types per hour
plot_df = df. groupby([df["time"].dt.hour, df["event type"]]).size().reset_index(name="event count")
#transcribe categorical data in column "event type" into integer values
#idx contains the list of event types according to their integer numbers
val, idx = plot_df["event type"].factorize()
plot_df["event_num"] = val
#generate evenly spaced x- and y-values
x_range = np.arange(24)
y_range = np.arange(idx.size)
#and create x-y arrays for the 3D plot
X, Y = np.meshgrid(x_range, y_range)
#and fill z-values with zeros
Z = np.zeros(X.shape)
#or the event count, if exists
Z[plot_df["event_num"], plot_df["time"]] = plot_df["event count"]
#create figure with a 3D projection axis
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.plot_surface(X, Y, Z)
ax.zaxis.set_major_locator(MaxNLocator(integer=True))
ax.set_yticks(y_range, idx)
ax.set_ylabel("event type")
ax.set_xlabel("time (in h)")
ax.set_zlabel("count")
plt.show()
示例输出:
但是,well-known matplotlib 有时会以正确的可见顺序绘制表面时出现问题。根据您的数据,使用散点图可能会更好:
...
X, Y = np.meshgrid(x_range, y_range)
Z = np.full(X.shape, np.nan)
Z[plot_df["event_num"], plot_df["time"]] = plot_df["event count"]
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.scatter(X, Y, Z)
ax.zaxis.set_major_locator(MaxNLocator(integer=True))
...