根据 pandas df 中其他列中的值有条件地填充列
Conditionally fill column based off values in other columns in a pandas df
这个问题类似于一些关于有条件地填充列的问题,但我的 df
有点复杂。
我有一个 df
列,其中包含浮点数和字符串。我正在尝试根据字符串有条件地填充包含浮点数的列。
基于下面的df
:
如果 Code
中的值以 A
开头,我想保持原样。
如果值 Code
以 B
开头,我想保持相同的初始值和 return nan's
到以下行,直到 [=中的下一个值=18=].
如果Code
中的值以C
开头,我想保持相同的第一个值,直到下一个值在['Numx','Numy]
中浮动
import pandas as pd
import numpy as np
d = ({
'Code' :['A1','A1','','B1','B1','A2','A2','','B2','B2','','A3','A3','A3','','B1','','B4','B4','A2','A2','A1','A1','','B4','B4','C1','C1','','','D1','','B2'],
'Numx' : [30.2,30.5,30.6,35.6,40.2,45.5,46.1,48.1,48.5,42.2,'',30.5,30.6,35.6,40.2,45.5,'',48.1,48.5,42.2, 40.1,48.5,42.2,'',48.5,42.2,43.1,44.1,'','','','',45.1],
'Numy' : [1.9,2.3,2.5,2.2,2.5,3.1,3.4,3.6,3.7,5.4,'',2.3,2.5,2.2,2.5,3.1,'',3.6,3.7,5.4,6.5,8.5,2.2,'',8.5,2.2,2.3,2.5,'','','','',3.2]
})
df = pd.DataFrame(数据=d)
输出:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 40.2 2.5
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 42.2 5.4
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 48.5 3.7
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 42.2 2.2
26 C1 43.1 2.3
27 C1 44.1 2.5
28 nan nan
29 nan nan
30 D1 nan nan
31 nan nan
32 B2 45.1 3.2
当 Code
中的值为 B
:
时,我在想这样的事情
df['Numx'] = np.where(df['Code'] == 'B-'.ffill())
df['Numy'] = np.where(df['Code'] == 'B-'.ffill())
所以我想要的输出是:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 nan nan
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 nan nan
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 nan nan
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 nan nan
26 C1 43.1 2.3
27 C1 43.1 2.3
28 43.1 2.3
29 43.1 2.3
30 D1 43.1 2.3
31 43.1 2.3
32 B2 45.1 3.2
我认为需要:
df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'BB'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 35.6 2.2 BB
5 35.6 2.2 BB
6 35.6 2.2 BB
7 CC 35.6 2.2 BB
8 35.6 2.2 BB
9 DD 35.6 2.2 BB
或者:
df = df.replace('nan', np.nan)
df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()
m1 = df['Code_new'].duplicated() & (df['Code_new'] == 'AA')
df[['Numx','Numy']] = df[['Numx','Numy']].mask(m1)
m2 = df['Code_new'] == 'BB'
df.loc[m2, ['Numx','Numy']] = df.loc[m2, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 40.2 2.5 BB
5 45.5 3.1 BB
6 45.5 3.1 BB
7 CC 45.5 3.1 BB
8 45.5 3.1 BB
9 DD 42.2 5.4 BB
这个问题类似于一些关于有条件地填充列的问题,但我的 df
有点复杂。
我有一个 df
列,其中包含浮点数和字符串。我正在尝试根据字符串有条件地填充包含浮点数的列。
基于下面的df
:
如果 Code
中的值以 A
开头,我想保持原样。
如果值 Code
以 B
开头,我想保持相同的初始值和 return nan's
到以下行,直到 [=中的下一个值=18=].
如果Code
中的值以C
开头,我想保持相同的第一个值,直到下一个值在['Numx','Numy]
中浮动
import pandas as pd
import numpy as np
d = ({
'Code' :['A1','A1','','B1','B1','A2','A2','','B2','B2','','A3','A3','A3','','B1','','B4','B4','A2','A2','A1','A1','','B4','B4','C1','C1','','','D1','','B2'],
'Numx' : [30.2,30.5,30.6,35.6,40.2,45.5,46.1,48.1,48.5,42.2,'',30.5,30.6,35.6,40.2,45.5,'',48.1,48.5,42.2, 40.1,48.5,42.2,'',48.5,42.2,43.1,44.1,'','','','',45.1],
'Numy' : [1.9,2.3,2.5,2.2,2.5,3.1,3.4,3.6,3.7,5.4,'',2.3,2.5,2.2,2.5,3.1,'',3.6,3.7,5.4,6.5,8.5,2.2,'',8.5,2.2,2.3,2.5,'','','','',3.2]
})
df = pd.DataFrame(数据=d)
输出:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 40.2 2.5
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 42.2 5.4
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 48.5 3.7
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 42.2 2.2
26 C1 43.1 2.3
27 C1 44.1 2.5
28 nan nan
29 nan nan
30 D1 nan nan
31 nan nan
32 B2 45.1 3.2
当 Code
中的值为 B
:
df['Numx'] = np.where(df['Code'] == 'B-'.ffill())
df['Numy'] = np.where(df['Code'] == 'B-'.ffill())
所以我想要的输出是:
Code Numx Numy
0 A1 30.2 1.9
1 A1 30.5 2.3
2 30.6 2.5
3 B1 35.6 2.2
4 B1 nan nan
5 A2 45.5 3.1
6 A2 46.1 3.4
7 48.1 3.6
8 B2 48.5 3.7
9 B2 nan nan
10 nan nan
11 A3 30.5 2.3
12 A3 30.6 2.5
13 A3 35.6 2.2
14 40.2 2.5
15 B1 45.5 3.1
16 nan nan
17 B4 48.1 3.6
18 B4 nan nan
19 A2 42.2 5.4
20 A2 40.1 6.5
21 A1 48.5 8.5
22 A1 42.2 2.2
23 nan nan
24 B4 48.5 8.5
25 B4 nan nan
26 C1 43.1 2.3
27 C1 43.1 2.3
28 43.1 2.3
29 43.1 2.3
30 D1 43.1 2.3
31 43.1 2.3
32 B2 45.1 3.2
我认为需要:
df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()
df[['Numx','Numy']] = df[['Numx','Numy']].mask(df['Code_new'].duplicated())
mask = df['Code_new'] == 'BB'
df.loc[mask, ['Numx','Numy']] = df.loc[mask, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 35.6 2.2 BB
5 35.6 2.2 BB
6 35.6 2.2 BB
7 CC 35.6 2.2 BB
8 35.6 2.2 BB
9 DD 35.6 2.2 BB
或者:
df = df.replace('nan', np.nan)
df['Code_new'] = df['Code'].where(df['Code'].isin(['AA','BB'])).ffill()
m1 = df['Code_new'].duplicated() & (df['Code_new'] == 'AA')
df[['Numx','Numy']] = df[['Numx','Numy']].mask(m1)
m2 = df['Code_new'] == 'BB'
df.loc[m2, ['Numx','Numy']] = df.loc[m2, ['Numx','Numy']].ffill()
print (df)
Code Numx Numy Code_new
0 AA 30.2 1.9 AA
1 NaN NaN AA
2 NaN NaN AA
3 BB 35.6 2.2 BB
4 40.2 2.5 BB
5 45.5 3.1 BB
6 45.5 3.1 BB
7 CC 45.5 3.1 BB
8 45.5 3.1 BB
9 DD 42.2 5.4 BB