Python pandas 根据其他列的条件替换更改的单元格值

Question

我有一个数据框：

df = pd.DataFrame({'cust': {0: 'A',
  1: 'A',
  2: 'A',
  3: 'A',
  4: 'B',
  5: 'B',
  6: 'B',
  7: 'B',
  8: 'B'},
 'value': {0: 6, 1: 10, 2: 11, 3: 15, 4: 6, 5: 12, 6: 21, 7: 29, 8: 33},
 'signal': {0: 0, 1: 1, 2: 1, 3: 0, 4: 1, 5: 0, 6: 0, 7: 0, 8: 0}})


  cust  value  signal
0    A      6       0
1    A     10       1
2    A     11       1
3    A     15       0
4    B      6       1
5    B     12       0
6    B     21       0
7    B     29       0
8    B     33       0

当 signal != 0 时，我应该将每个客户的“值”替换为之前的值。例如，在索引为 1 的行中，值 = 10 应替换为其先前的值，即 6。在索引为 4 的行中，我无法将值 6 替换为其之前的值，因为客户“B”没有之前的值。在这种情况下，我应该将值替换为 0.

我有一个包含 5000 万行的数据框，如何以最高效的方式做到这一点？

Answer 1

IIUC，你可以使用 groupby (on cust column) transform using shift, fillna and then select the appropriate value with np.where:

# find previous value from each customer group
res = df.groupby('cust')['value'].transform('shift').fillna(0)

# replace values
df['value'] = np.where(df['signal'].ne(0), res, df['value'])

print(df)

输出

  cust  value  signal
0    A    6.0       0
1    A    6.0       1
2    A   10.0       1
3    A   15.0       0
4    B    0.0       1
5    B   12.0       0
6    B   21.0       0
7    B   29.0       0
8    B   33.0       0

更新

如果您需要传播最后一个有效值，请执行以下操作：

# make invalid values na, to use ffill
df['value'] = np.where(df['signal'].ne(0), np.nan, df['value'])

# use ffill, for values at the beginning of the group fillna(0)
df['value'] = df.groupby('cust')['value'].transform('ffill').fillna(0)

print(df)

输出

  cust  value  signal
0    A    6.0       0
1    A    6.0       1
2    A    6.0       1
3    A   15.0       0
4    B    0.0       1
5    B   12.0       0
6    B   21.0       0
7    B   29.0       0
8    B   33.0       0

Python pandas 根据其他列的条件替换更改的单元格值

Python pandas replacing changing cell value by condition from other column

transform

apply

python-3.x

pandas

pandas-groupby