如何在 pandas 数据框中创建一个列来验证是否发生了状态转换?

How to create a column in a pandas dataframe that verifies if a state transition has occured?

以下为原始dataframe:

 uid    timestamp      state
   1     2015-01-01      fail  
   2     2015-01-07      fail  
   2     2015-03-02      fail  
   1     2015-01-03      pass  
   1     2015-01-02      warn  
   2     2015-03-01      pass  
   1     2015-01-04      pass  
   1     2015-01-07      pass  
   2     2015-01-01      warn  

这是我想要生成的结果数据框:

 uid     timestamp      state  fail->pass?
   1     2015-01-01      fail  True
   2     2015-01-07      pass  False
   2     2015-03-02      fail  False
   1     2015-01-03      pass  True
   1     2015-01-02      warn  True
   2     2015-03-01      pass  False
   1     2015-01-04      pass  True
   1     2015-01-07      pass  True
   2     2015-01-01      warn  False

“失败->通过?” column 是一个布尔值列,它告诉您 UID 是否从失败状态变为通过状态。此通过状态必须是 UID 的最终状态。下降状态可以发生在最终状态之前的任何时间。最终状态出现在该 UID 的最新时间戳。

创建此列的最有效方法是什么?每个 UID 可能有数百个状态转换。

df = pd.DataFrame({'uid': [1, 2, 2, 1, 1, 2, 1, 1, 2],
 'timestamp': ['2015-01-01',
  '2015-01-07',
  '2015-03-02',
  '2015-01-03',
  '2015-01-02',
  '2015-03-01',
  '2015-01-04',
  '2015-01-07',
  '2015-01-01'],
 'state': ['fail',
  'pass',
  'fail',
  'pass',
  'warn',
  'pass',
  'pass',
  'pass',
  'warn'],
 'fail->pass?': [True, False, False, True, True, False, True, True, False]})


df['timestamp'] = pd.to_datetime(df['timestamp'])
df = df.sort_values(by='timestamp')

fp = (df[['uid','state']].groupby('uid').last()=='pass').reset_index()
fp.columns = ['uid','fail->pass?']

df.merge(fp, on='uid').sort_values(by='timestamp')

输出

   uid  timestamp   state   fail->pass?
0   1   2015-01-01  fail    True
5   2   2015-01-01  warn    False
1   1   2015-01-02  warn    True
2   1   2015-01-03  pass    True
3   1   2015-01-04  pass    True
4   1   2015-01-07  pass    True
6   2   2015-01-07  fail    False
7   2   2015-03-01  pass    False
8   2   2015-03-02  fail    False