根据条件将行值替换为同一 df 中的其他行值
Replace row values with other row values from same df based on conditions
我有以下数据集:
df = pd.DataFrame( {'user': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 2},
'date': {0: '1995-09-01', 1: '1995-09-02', 2: '1995-10-03', 3: '1995-10-04', 4: '1995-10-05', 5: '1995-11-07', 6: '1995-11-08'},
'x': {0: '1995-09-02', 1: '1995-09-02', 2: '1995-09-02', 3: '1995-10-05', 4: '1995-10-05', 5: '1995-10-05', 6: '1995-10-05'},
'y': {0: '1995-10-03', 1: '1995-10-03', 2: '1995-10-03', 3: '1995-11-08', 4: '1995-11-08', 5: '1995-11-08', 6: '1995-11-08'},
'c1': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'},
'c2': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'},
'c3': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'},
'VTX1': {0: 1, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0},
'VTY1': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 0}} )
这给了我:
user date x y c1 c2 c3 VTX1 VTY1
0 1 1995-09-01 1995-09-02 1995-10-03 1 1 1 1 0
1 1 1995-09-02 1995-09-02 1995-10-03 0 0 0 0 1
2 1 1995-10-03 1995-09-02 1995-10-03 0 0 0 0 0
3 2 1995-10-04 1995-10-05 1995-11-08 2 2 2 1 0
4 2 1995-10-05 1995-10-05 1995-11-08 0 0 0 0 0
5 2 1995-11-07 1995-10-05 1995-11-08 9 9 9 0 1
6 2 1995-11-08 1995-10-05 1995-11-08 0 0 0 0 0
我想替换 df['c1'] 如下。
- When df[‘date’]=df[‘x’],
change df[‘c1’] for the df[‘c1’] value when df[‘VTX1’]=1
在这个例子中,对于用户1,当df['date']=df['x']时它恰好在索引1上。这里我们希望df['c1']为1。请注意,当 df['VTX1'] = 1.
时,1 是用户 1 在 df['c1'] 上的值
所以最终结果将是:
user date x y c1 c2 c3 VTX1 VTY1
0 1 1995-09-01 1995-09-02 1995-10-03 1 1 1 1 0
1 1 1995-09-02 1995-09-02 1995-10-03 0 0 0 0 1
2 1 1995-10-03 1995-09-02 1995-10-03 0 0 0 0 0
3 2 1995-10-04 1995-10-05 1995-11-08 2 2 2 1 0
4 2 1995-10-05 1995-10-05 1995-11-08 2 0 0 0 0
5 2 1995-11-07 1995-10-05 1995-11-08 9 9 9 0 1
6 2 1995-11-08 1995-10-05 1995-11-08 0 0 0 0 0
对于每个唯一用户 select 列 VTX1
具有值 1
的行,这可以通过将索引设置为 user
并使用 query
到 select 所需的行。然后 mask
c1
中的值,其中 date
等于 x
并使用映射系列 d
替换掩码值
d = df.set_index('user').query('VTX1 == 1')['c1']
df['c1'] = df['c1'].mask(df['date'].eq(df['x']), df['user'].map(d))
user date x y c1 c2 c3 VTX1 VTY1
0 1 1995-09-01 1995-09-02 1995-10-03 1 1 1 1 0
1 1 1995-09-02 1995-09-02 1995-10-03 1 0 0 0 1
2 1 1995-10-03 1995-09-02 1995-10-03 0 0 0 0 0
3 2 1995-10-04 1995-10-05 1995-11-08 2 2 2 1 0
4 2 1995-10-05 1995-10-05 1995-11-08 2 0 0 0 0
5 2 1995-11-07 1995-10-05 1995-11-08 9 9 9 0 1
6 2 1995-11-08 1995-10-05 1995-11-08 0 0 0 0 0
我有以下数据集:
df = pd.DataFrame( {'user': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2, 6: 2},
'date': {0: '1995-09-01', 1: '1995-09-02', 2: '1995-10-03', 3: '1995-10-04', 4: '1995-10-05', 5: '1995-11-07', 6: '1995-11-08'},
'x': {0: '1995-09-02', 1: '1995-09-02', 2: '1995-09-02', 3: '1995-10-05', 4: '1995-10-05', 5: '1995-10-05', 6: '1995-10-05'},
'y': {0: '1995-10-03', 1: '1995-10-03', 2: '1995-10-03', 3: '1995-11-08', 4: '1995-11-08', 5: '1995-11-08', 6: '1995-11-08'},
'c1': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'},
'c2': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'},
'c3': {0: '1', 1: '0', 2: '0', 3: '2', 4: '0', 5: '9', 6: '0'},
'VTX1': {0: 1, 1: 0, 2: 0, 3: 1, 4: 0, 5: 0, 6: 0},
'VTY1': {0: 0, 1: 1, 2: 0, 3: 0, 4: 0, 5: 1, 6: 0}} )
这给了我:
user date x y c1 c2 c3 VTX1 VTY1
0 1 1995-09-01 1995-09-02 1995-10-03 1 1 1 1 0
1 1 1995-09-02 1995-09-02 1995-10-03 0 0 0 0 1
2 1 1995-10-03 1995-09-02 1995-10-03 0 0 0 0 0
3 2 1995-10-04 1995-10-05 1995-11-08 2 2 2 1 0
4 2 1995-10-05 1995-10-05 1995-11-08 0 0 0 0 0
5 2 1995-11-07 1995-10-05 1995-11-08 9 9 9 0 1
6 2 1995-11-08 1995-10-05 1995-11-08 0 0 0 0 0
我想替换 df['c1'] 如下。
- When df[‘date’]=df[‘x’],
change df[‘c1’] for the df[‘c1’] value when df[‘VTX1’]=1
在这个例子中,对于用户1,当df['date']=df['x']时它恰好在索引1上。这里我们希望df['c1']为1。请注意,当 df['VTX1'] = 1.
时,1 是用户 1 在 df['c1'] 上的值所以最终结果将是:
user date x y c1 c2 c3 VTX1 VTY1
0 1 1995-09-01 1995-09-02 1995-10-03 1 1 1 1 0
1 1 1995-09-02 1995-09-02 1995-10-03 0 0 0 0 1
2 1 1995-10-03 1995-09-02 1995-10-03 0 0 0 0 0
3 2 1995-10-04 1995-10-05 1995-11-08 2 2 2 1 0
4 2 1995-10-05 1995-10-05 1995-11-08 2 0 0 0 0
5 2 1995-11-07 1995-10-05 1995-11-08 9 9 9 0 1
6 2 1995-11-08 1995-10-05 1995-11-08 0 0 0 0 0
对于每个唯一用户 select 列 VTX1
具有值 1
的行,这可以通过将索引设置为 user
并使用 query
到 select 所需的行。然后 mask
c1
中的值,其中 date
等于 x
并使用映射系列 d
d = df.set_index('user').query('VTX1 == 1')['c1']
df['c1'] = df['c1'].mask(df['date'].eq(df['x']), df['user'].map(d))
user date x y c1 c2 c3 VTX1 VTY1
0 1 1995-09-01 1995-09-02 1995-10-03 1 1 1 1 0
1 1 1995-09-02 1995-09-02 1995-10-03 1 0 0 0 1
2 1 1995-10-03 1995-09-02 1995-10-03 0 0 0 0 0
3 2 1995-10-04 1995-10-05 1995-11-08 2 2 2 1 0
4 2 1995-10-05 1995-10-05 1995-11-08 2 0 0 0 0
5 2 1995-11-07 1995-10-05 1995-11-08 9 9 9 0 1
6 2 1995-11-08 1995-10-05 1995-11-08 0 0 0 0 0