如何在 pandas 数据框中创建一个列来验证是否发生了状态转换?
How to create a column in a pandas dataframe that verifies if a state transition has occured?
以下为原始dataframe:
uid timestamp state
1 2015-01-01 fail
2 2015-01-07 fail
2 2015-03-02 fail
1 2015-01-03 pass
1 2015-01-02 warn
2 2015-03-01 pass
1 2015-01-04 pass
1 2015-01-07 pass
2 2015-01-01 warn
这是我想要生成的结果数据框:
uid timestamp state fail->pass?
1 2015-01-01 fail True
2 2015-01-07 pass False
2 2015-03-02 fail False
1 2015-01-03 pass True
1 2015-01-02 warn True
2 2015-03-01 pass False
1 2015-01-04 pass True
1 2015-01-07 pass True
2 2015-01-01 warn False
“失败->通过?” column 是一个布尔值列,它告诉您 UID 是否从失败状态变为通过状态。此通过状态必须是 UID 的最终状态。下降状态可以发生在最终状态之前的任何时间。最终状态出现在该 UID 的最新时间戳。
创建此列的最有效方法是什么?每个 UID 可能有数百个状态转换。
df = pd.DataFrame({'uid': [1, 2, 2, 1, 1, 2, 1, 1, 2],
'timestamp': ['2015-01-01',
'2015-01-07',
'2015-03-02',
'2015-01-03',
'2015-01-02',
'2015-03-01',
'2015-01-04',
'2015-01-07',
'2015-01-01'],
'state': ['fail',
'pass',
'fail',
'pass',
'warn',
'pass',
'pass',
'pass',
'warn'],
'fail->pass?': [True, False, False, True, True, False, True, True, False]})
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values(by='timestamp')
fp = (df[['uid','state']].groupby('uid').last()=='pass').reset_index()
fp.columns = ['uid','fail->pass?']
df.merge(fp, on='uid').sort_values(by='timestamp')
输出
uid timestamp state fail->pass?
0 1 2015-01-01 fail True
5 2 2015-01-01 warn False
1 1 2015-01-02 warn True
2 1 2015-01-03 pass True
3 1 2015-01-04 pass True
4 1 2015-01-07 pass True
6 2 2015-01-07 fail False
7 2 2015-03-01 pass False
8 2 2015-03-02 fail False
以下为原始dataframe:
uid timestamp state
1 2015-01-01 fail
2 2015-01-07 fail
2 2015-03-02 fail
1 2015-01-03 pass
1 2015-01-02 warn
2 2015-03-01 pass
1 2015-01-04 pass
1 2015-01-07 pass
2 2015-01-01 warn
这是我想要生成的结果数据框:
uid timestamp state fail->pass?
1 2015-01-01 fail True
2 2015-01-07 pass False
2 2015-03-02 fail False
1 2015-01-03 pass True
1 2015-01-02 warn True
2 2015-03-01 pass False
1 2015-01-04 pass True
1 2015-01-07 pass True
2 2015-01-01 warn False
“失败->通过?” column 是一个布尔值列,它告诉您 UID 是否从失败状态变为通过状态。此通过状态必须是 UID 的最终状态。下降状态可以发生在最终状态之前的任何时间。最终状态出现在该 UID 的最新时间戳。
创建此列的最有效方法是什么?每个 UID 可能有数百个状态转换。
df = pd.DataFrame({'uid': [1, 2, 2, 1, 1, 2, 1, 1, 2],
'timestamp': ['2015-01-01',
'2015-01-07',
'2015-03-02',
'2015-01-03',
'2015-01-02',
'2015-03-01',
'2015-01-04',
'2015-01-07',
'2015-01-01'],
'state': ['fail',
'pass',
'fail',
'pass',
'warn',
'pass',
'pass',
'pass',
'warn'],
'fail->pass?': [True, False, False, True, True, False, True, True, False]})
df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values(by='timestamp')
fp = (df[['uid','state']].groupby('uid').last()=='pass').reset_index()
fp.columns = ['uid','fail->pass?']
df.merge(fp, on='uid').sort_values(by='timestamp')
输出
uid timestamp state fail->pass?
0 1 2015-01-01 fail True
5 2 2015-01-01 warn False
1 1 2015-01-02 warn True
2 1 2015-01-03 pass True
3 1 2015-01-04 pass True
4 1 2015-01-07 pass True
6 2 2015-01-07 fail False
7 2 2015-03-01 pass False
8 2 2015-03-02 fail False