基于多个变量的迭代计数器
Iterative counter based on several variables
我正在尝试构建一个计数器来跟踪多个不同用户的失败和成功次数。我有一个数据框,其中包含重复的用户代码(如果有更多关于同一用户的事件)和一个时间戳来跟踪时间变量。我想添加两列(成功数、失败数)来累积前面事件的结果。
示例数据:
data=pd.DataFrame(
{
'user_id': [2,2,3,2,4,5,3,3,6,6,6,7],
'timestamp': [1567641600,1567691600,1567741600,1567941600, 1567981600, 1567991600,1568391600,1568541600,1568741600,1568941600,1568981600,1568988600],
'status': ['yes','no','yes','no', 'yes', 'yes','yes','no','no','yes','no','yes']
}
)
我尝试在 R 中使用一些循环,但我担心我遗漏了一些东西,也许 Python 中有更好的方法来做到这一点?
想要的结果应该是这样的:
data=pd.DataFrame(
{
'user_id': [2,2,3,2,4,5,3,3,6,6,6,7],
'timestamp': [1567641600,1567691600,1567741600,1567941600, 1567981600, 1567991600,1568391600,1568541600,1568741600,1568941600,1568981600,1568988600],
'status': ['yes','no','yes','no', 'yes', 'yes','yes','no','no','yes','no','yes'],
'number_yes':[1,1,1,1,1,1,2,2,0,1,1,1],
'number_no':[0,1,0,2,0,0,0,1,1,1,2,0]
}
)
使用,Series.eq
to create a boolean mask, then use Series.groupby
, on this mask and transform the grouped series using .cumsum
:
m = data['status'].eq('yes')
data = data.assign(
number_yes=m.groupby(data['user_id']).cumsum(),
number_no=(~m).groupby(data['user_id']).cumsum()
)
# print(data)
user_id timestamp status number_yes number_no
0 2 1567641600 yes 1.0 0.0
1 2 1567691600 no 1.0 1.0
2 3 1567741600 yes 1.0 0.0
3 2 1567941600 no 1.0 2.0
4 4 1567981600 yes 1.0 0.0
5 5 1567991600 yes 1.0 0.0
6 3 1568391600 yes 2.0 0.0
7 3 1568541600 no 2.0 1.0
8 6 1568741600 no 0.0 1.0
9 6 1568941600 yes 1.0 1.0
10 6 1568981600 no 1.0 2.0
11 7 1568988600 yes 1.0 0.0
data['number_yes'] = data.groupby('user_id').status.transform(lambda x: (x == 'yes').cumsum())
data['number_no'] = data.groupby('user_id').status.transform(lambda x: (x == 'no').cumsum())
结果:
user_id timestamp status number_yes number_no
0 2 1567641600 yes 1 0
1 2 1567691600 no 1 1
2 3 1567741600 yes 1 0
3 2 1567941600 no 1 2
4 4 1567981600 yes 1 0
5 5 1567991600 yes 1 0
6 3 1568391600 yes 2 0
7 3 1568541600 no 2 1
8 6 1568741600 no 0 1
9 6 1568941600 yes 1 1
10 6 1568981600 no 1 2
11 7 1568988600 yes 1 0
让我们使用 get_dummies
:
data.join(data['status'].str.get_dummies()
.groupby(data['user_id']).cumsum()
.add_prefix('Number_'))
输出:
user_id timestamp status Number_no Number_yes
0 2 1567641600 yes 0 1
1 2 1567691600 no 1 1
2 3 1567741600 yes 0 1
3 2 1567941600 no 2 1
4 4 1567981600 yes 0 1
5 5 1567991600 yes 0 1
6 3 1568391600 yes 0 2
7 3 1568541600 no 1 2
8 6 1568741600 no 1 0
9 6 1568941600 yes 1 1
10 6 1568981600 no 2 1
11 7 1568988600 yes 0 1
我喜欢使用 str.get_dummies
的地方在于它不仅可以处理 'yes' 和 'no',让我们插入一个新状态 'maybe':
data=pd.DataFrame(
{
'user_id': [2,2,3,2,4,5,3,3,6,6,6,7],
'timestamp': [1567641600,1567691600,1567741600,1567941600, 1567981600, 1567991600,1568391600,1568541600,1568741600,1568941600,1568981600,1568988600],
'status': ['yes','no','yes','no', 'maybe', 'yes','yes','no','maybe','yes','no','yes']
})
data.join(data['status'].str.get_dummies()
.groupby(data['user_id']).cumsum()
.add_prefix('Number_'))
输出:
user_id timestamp status Number_maybe Number_no Number_yes
0 2 1567641600 yes 0 0 1
1 2 1567691600 no 0 1 1
2 3 1567741600 yes 0 0 1
3 2 1567941600 no 0 2 1
4 4 1567981600 maybe 1 0 0
5 5 1567991600 yes 0 0 1
6 3 1568391600 yes 0 0 2
7 3 1568541600 no 0 1 2
8 6 1568741600 maybe 1 0 0
9 6 1568941600 yes 1 0 1
10 6 1568981600 no 1 1 1
11 7 1568988600 yes 0 0 1
我正在尝试构建一个计数器来跟踪多个不同用户的失败和成功次数。我有一个数据框,其中包含重复的用户代码(如果有更多关于同一用户的事件)和一个时间戳来跟踪时间变量。我想添加两列(成功数、失败数)来累积前面事件的结果。
示例数据:
data=pd.DataFrame(
{
'user_id': [2,2,3,2,4,5,3,3,6,6,6,7],
'timestamp': [1567641600,1567691600,1567741600,1567941600, 1567981600, 1567991600,1568391600,1568541600,1568741600,1568941600,1568981600,1568988600],
'status': ['yes','no','yes','no', 'yes', 'yes','yes','no','no','yes','no','yes']
}
)
我尝试在 R 中使用一些循环,但我担心我遗漏了一些东西,也许 Python 中有更好的方法来做到这一点?
想要的结果应该是这样的:
data=pd.DataFrame(
{
'user_id': [2,2,3,2,4,5,3,3,6,6,6,7],
'timestamp': [1567641600,1567691600,1567741600,1567941600, 1567981600, 1567991600,1568391600,1568541600,1568741600,1568941600,1568981600,1568988600],
'status': ['yes','no','yes','no', 'yes', 'yes','yes','no','no','yes','no','yes'],
'number_yes':[1,1,1,1,1,1,2,2,0,1,1,1],
'number_no':[0,1,0,2,0,0,0,1,1,1,2,0]
}
)
使用,Series.eq
to create a boolean mask, then use Series.groupby
, on this mask and transform the grouped series using .cumsum
:
m = data['status'].eq('yes')
data = data.assign(
number_yes=m.groupby(data['user_id']).cumsum(),
number_no=(~m).groupby(data['user_id']).cumsum()
)
# print(data)
user_id timestamp status number_yes number_no
0 2 1567641600 yes 1.0 0.0
1 2 1567691600 no 1.0 1.0
2 3 1567741600 yes 1.0 0.0
3 2 1567941600 no 1.0 2.0
4 4 1567981600 yes 1.0 0.0
5 5 1567991600 yes 1.0 0.0
6 3 1568391600 yes 2.0 0.0
7 3 1568541600 no 2.0 1.0
8 6 1568741600 no 0.0 1.0
9 6 1568941600 yes 1.0 1.0
10 6 1568981600 no 1.0 2.0
11 7 1568988600 yes 1.0 0.0
data['number_yes'] = data.groupby('user_id').status.transform(lambda x: (x == 'yes').cumsum())
data['number_no'] = data.groupby('user_id').status.transform(lambda x: (x == 'no').cumsum())
结果:
user_id timestamp status number_yes number_no
0 2 1567641600 yes 1 0
1 2 1567691600 no 1 1
2 3 1567741600 yes 1 0
3 2 1567941600 no 1 2
4 4 1567981600 yes 1 0
5 5 1567991600 yes 1 0
6 3 1568391600 yes 2 0
7 3 1568541600 no 2 1
8 6 1568741600 no 0 1
9 6 1568941600 yes 1 1
10 6 1568981600 no 1 2
11 7 1568988600 yes 1 0
让我们使用 get_dummies
:
data.join(data['status'].str.get_dummies()
.groupby(data['user_id']).cumsum()
.add_prefix('Number_'))
输出:
user_id timestamp status Number_no Number_yes
0 2 1567641600 yes 0 1
1 2 1567691600 no 1 1
2 3 1567741600 yes 0 1
3 2 1567941600 no 2 1
4 4 1567981600 yes 0 1
5 5 1567991600 yes 0 1
6 3 1568391600 yes 0 2
7 3 1568541600 no 1 2
8 6 1568741600 no 1 0
9 6 1568941600 yes 1 1
10 6 1568981600 no 2 1
11 7 1568988600 yes 0 1
我喜欢使用 str.get_dummies
的地方在于它不仅可以处理 'yes' 和 'no',让我们插入一个新状态 'maybe':
data=pd.DataFrame(
{
'user_id': [2,2,3,2,4,5,3,3,6,6,6,7],
'timestamp': [1567641600,1567691600,1567741600,1567941600, 1567981600, 1567991600,1568391600,1568541600,1568741600,1568941600,1568981600,1568988600],
'status': ['yes','no','yes','no', 'maybe', 'yes','yes','no','maybe','yes','no','yes']
})
data.join(data['status'].str.get_dummies()
.groupby(data['user_id']).cumsum()
.add_prefix('Number_'))
输出:
user_id timestamp status Number_maybe Number_no Number_yes
0 2 1567641600 yes 0 0 1
1 2 1567691600 no 0 1 1
2 3 1567741600 yes 0 0 1
3 2 1567941600 no 0 2 1
4 4 1567981600 maybe 1 0 0
5 5 1567991600 yes 0 0 1
6 3 1568391600 yes 0 0 2
7 3 1568541600 no 0 1 2
8 6 1568741600 maybe 1 0 0
9 6 1568941600 yes 1 0 1
10 6 1568981600 no 1 1 1
11 7 1568988600 yes 0 0 1