枚举一个值,直到达到 pandas 中的特定值
Enumerate a value until reaching a specific value in pandas
我有一个如下所示的数据框
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],'value':[1,2.5,1.1,1.4,1.5,1,1.5,3,1,1.6]})
df['value_at_1'] = np.where(df['value'] == 1,1,0)
df
>>>
id value value_at_1
1 1 1
2 2.5 0
3 1.1 0
4 1.4 0
5 1.5 0
6 1 1
7 1.5 0
8 3 0
9 1 1
10 1.6 0
我想创建一个枚举 "value_at_1" 变量的变量,每次递增直到达到 1,然后重新启动。结果是这样的:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],
'value':[1,2.5,1.1,1.4,1.5,1,1.5,3,1,1.6],
'value_at_1':[1,0,0,0,0,1,0,0,1,0],
'count_since_1':[0,1,2,3,4,0,1,2,0,1]}).set_index(['value_at_1'])
>>>
id value value_at_1 count_since_1
1 1 1 0
2 2.5 0 1
3 1.1 0 2
4 1.4 0 3
5 1.5 0 4
6 1 1 0
7 1.5 0 1
8 3 0 2
9 1 1 0
10 1.6 0 1
谁能帮我这样操作数据?谢谢!
在groupby
中使用cumsum
得到cumcount
df.assign(
count_since_1=df.value_at_1.groupby(df.value_at_1.cumsum()).cumcount())
id value value_at_1 count_since_1
0 1 1.0 1 0
1 2 2.5 0 1
2 3 1.1 0 2
3 4 1.4 0 3
4 5 1.5 0 4
5 6 1.0 1 0
6 7 1.5 0 1
7 8 3.0 0 2
8 9 1.0 1 0
9 10 1.6 0 1
只是想提供一种新的方式
import pandas as pd
import numpy as np
import functools
idx=df.index[df['value_at_1'].eq(1)].values.tolist()+[len(df)]
idx=list(np.diff(idx))
df['count_since_1']=functools.reduce(lambda x,y: x+y,[list(range(y)) for y in idx])
df
Out[945]:
id value value_at_1 count_since_1
0 1 1.0 1 0
1 2 2.5 0 1
2 3 1.1 0 2
3 4 1.4 0 3
4 5 1.5 0 4
5 6 1.0 1 0
6 7 1.5 0 1
7 8 3.0 0 2
8 9 1.0 1 0
9 10 1.6 0 1
我有一个如下所示的数据框
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],'value':[1,2.5,1.1,1.4,1.5,1,1.5,3,1,1.6]})
df['value_at_1'] = np.where(df['value'] == 1,1,0)
df
>>>
id value value_at_1
1 1 1
2 2.5 0
3 1.1 0
4 1.4 0
5 1.5 0
6 1 1
7 1.5 0
8 3 0
9 1 1
10 1.6 0
我想创建一个枚举 "value_at_1" 变量的变量,每次递增直到达到 1,然后重新启动。结果是这样的:
df = pd.DataFrame({'id':[1,2,3,4,5,6,7,8,9,10],
'value':[1,2.5,1.1,1.4,1.5,1,1.5,3,1,1.6],
'value_at_1':[1,0,0,0,0,1,0,0,1,0],
'count_since_1':[0,1,2,3,4,0,1,2,0,1]}).set_index(['value_at_1'])
>>>
id value value_at_1 count_since_1
1 1 1 0
2 2.5 0 1
3 1.1 0 2
4 1.4 0 3
5 1.5 0 4
6 1 1 0
7 1.5 0 1
8 3 0 2
9 1 1 0
10 1.6 0 1
谁能帮我这样操作数据?谢谢!
在groupby
中使用cumsum
得到cumcount
df.assign(
count_since_1=df.value_at_1.groupby(df.value_at_1.cumsum()).cumcount())
id value value_at_1 count_since_1
0 1 1.0 1 0
1 2 2.5 0 1
2 3 1.1 0 2
3 4 1.4 0 3
4 5 1.5 0 4
5 6 1.0 1 0
6 7 1.5 0 1
7 8 3.0 0 2
8 9 1.0 1 0
9 10 1.6 0 1
只是想提供一种新的方式
import pandas as pd
import numpy as np
import functools
idx=df.index[df['value_at_1'].eq(1)].values.tolist()+[len(df)]
idx=list(np.diff(idx))
df['count_since_1']=functools.reduce(lambda x,y: x+y,[list(range(y)) for y in idx])
df
Out[945]:
id value value_at_1 count_since_1
0 1 1.0 1 0
1 2 2.5 0 1
2 3 1.1 0 2
3 4 1.4 0 3
4 5 1.5 0 4
5 6 1.0 1 0
6 7 1.5 0 1
7 8 3.0 0 2
8 9 1.0 1 0
9 10 1.6 0 1