将时间序列转换为热图

Question

我正在 pandas 中寻找一个好的转换，它允许我从测量的时间序列转换为每个时间段每个时间段的计数列表。

假设我有我的：

x = list(range(count))
y = [random.gauss(1, 0.1) for _ in range(count)]

我可以将其转换为两边的分箱间隔：

df = pandas.DataFrame.from_dict({'x': x, 'y': y})
df['x'].update(pandas.cut(df['x'], 20))
df['y'].update(pandas.cut(df['y'], 20))

我知道我可以使用以下方法获取 y 的值计数：

df['y'].value_counts()

但是我无法将 "run value_counts on y grouped by unique x values, then unroll, and return that" 放入有效操作中。

示例：

y = [1, 1, 2, 3, 4, 4]
x = [0, 1, 2, 3, 4, 5]
bin_count = 2

预计：

df: x    y  count
    0-2  1  2
    0-2  2  1
    3-5  3  1
    3-5  4  2

Answer 1

我相信你需要SeriesGroupBy.value_counts with reset_index:

y = [1, 1, 2, 3, 4, 4]
x = [0, 1, 2, 3, 4, 5]
bin_count = 2
df = pd.DataFrame.from_dict({'x': x, 'y': y})
df['x'].update(pd.cut(df['x'], bin_count))

df1 = df.groupby('x')['y'].value_counts().reset_index(name='count')
print (df1)
               x  y  count
0  (-0.005, 2.5]  1      2
1  (-0.005, 2.5]  2      1
2     (2.5, 5.0]  4      2
3     (2.5, 5.0]  3      1

对于来自 y 的列，使用 unstack:

df1 = df.groupby('x')['y'].value_counts().unstack(fill_value=0)
print (df1)
y              1  2  3  4
x                        
(-0.005, 2.5]  2  1  0  0
(2.5, 5.0]     0  0  1  2

编辑：

如果 bin 需要唯一值，请将参数 labels=False 添加到 cut:

df['x'].update(pd.cut(df['x'], bin_count, labels=False))

df1 = df.groupby('x')['y'].value_counts().unstack(fill_value=0)
print (df1)
y  1  2  3  4
x            
0  2  1  0  0
1  0  0  1  2

将时间序列转换为热图

Converting time series into a heatmap

python

heatmap

pandas