如何计算滚动中数据框的一列中相同实例的数量window
How to count the number of the same instances in a column of a data frame in a rolling window
我正在尝试计算每个滑动 window 中相同 ID 的数量,以获取此数据:
ID
DATE
2017-05-17 15:49:51 s_2
2017-05-17 15:49:52 s_5
2017-05-17 15:49:55 s_2
2017-05-17 15:49:56 s_3
2017-05-17 15:49:58 s_5
2017-05-17 15:49:59 s_5
我正在尝试计算 rolling window 尺寸 3 中相互重叠的相同 ID 的数量。答案应该是这样的:
DATE ID s_2_count s_3_count s_5_count
2017-05-17 15:49:51 s_2 2 0 1
2017-05-17 15:49:52 s_5 1 1 1
2017-05-17 15:49:55 s_2 1 1 1
2017-05-17 15:49:56 s_3 0 1 2
2017-05-17 15:49:58 s_5 NaN NaN NaN
2017-05-17 15:49:59 s_5 NaN NaN NaN
使用 str.get_dummies
、rolling
、sum
、shift
和 add_prefix
:
df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count')
输出:
s_2_count s_3_count s_5_count
DATE
2017-05-17 15:49:51 2.0 0.0 1.0
2017-05-17 15:49:52 1.0 1.0 1.0
2017-05-17 15:49:55 1.0 1.0 1.0
2017-05-17 15:49:56 0.0 1.0 2.0
2017-05-17 15:49:58 NaN NaN NaN
2017-05-17 15:49:59 NaN NaN NaN
让我们将其分配回数据帧:
df.assign(**df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
或使用加入
df.join(df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
输出:
ID s_2_count s_3_count s_5_count
DATE
2017-05-17 15:49:51 s_2 2.0 0.0 1.0
2017-05-17 15:49:52 s_5 1.0 1.0 1.0
2017-05-17 15:49:55 s_2 1.0 1.0 1.0
2017-05-17 15:49:56 s_3 0.0 1.0 2.0
2017-05-17 15:49:58 s_5 NaN NaN NaN
2017-05-17 15:49:59 s_5 NaN NaN NaN
选项 2 使用 pd.crosstab
df.assign(**pd.crosstab(df.index,df.ID).rolling(3).sum().shift(-2))
或使用加入
df.join(pd.crosstab(df.index,df.ID).rolling(3).sum().shift(-2))
我正在尝试计算每个滑动 window 中相同 ID 的数量,以获取此数据:
ID
DATE
2017-05-17 15:49:51 s_2
2017-05-17 15:49:52 s_5
2017-05-17 15:49:55 s_2
2017-05-17 15:49:56 s_3
2017-05-17 15:49:58 s_5
2017-05-17 15:49:59 s_5
我正在尝试计算 rolling window 尺寸 3 中相互重叠的相同 ID 的数量。答案应该是这样的:
DATE ID s_2_count s_3_count s_5_count
2017-05-17 15:49:51 s_2 2 0 1
2017-05-17 15:49:52 s_5 1 1 1
2017-05-17 15:49:55 s_2 1 1 1
2017-05-17 15:49:56 s_3 0 1 2
2017-05-17 15:49:58 s_5 NaN NaN NaN
2017-05-17 15:49:59 s_5 NaN NaN NaN
使用 str.get_dummies
、rolling
、sum
、shift
和 add_prefix
:
df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count')
输出:
s_2_count s_3_count s_5_count
DATE
2017-05-17 15:49:51 2.0 0.0 1.0
2017-05-17 15:49:52 1.0 1.0 1.0
2017-05-17 15:49:55 1.0 1.0 1.0
2017-05-17 15:49:56 0.0 1.0 2.0
2017-05-17 15:49:58 NaN NaN NaN
2017-05-17 15:49:59 NaN NaN NaN
让我们将其分配回数据帧:
df.assign(**df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
或使用加入
df.join(df.ID.str.get_dummies().rolling(3).sum().shift(-2).add_suffix('_count'))
输出:
ID s_2_count s_3_count s_5_count
DATE
2017-05-17 15:49:51 s_2 2.0 0.0 1.0
2017-05-17 15:49:52 s_5 1.0 1.0 1.0
2017-05-17 15:49:55 s_2 1.0 1.0 1.0
2017-05-17 15:49:56 s_3 0.0 1.0 2.0
2017-05-17 15:49:58 s_5 NaN NaN NaN
2017-05-17 15:49:59 s_5 NaN NaN NaN
选项 2 使用 pd.crosstab
df.assign(**pd.crosstab(df.index,df.ID).rolling(3).sum().shift(-2))
或使用加入
df.join(pd.crosstab(df.index,df.ID).rolling(3).sum().shift(-2))