在没有地图的情况下替换 pandas 数据框中的多个值的优雅方法?
Elegant way to replace multiple values in pandas dataframe without map?
我有一个如下所示的数据框
import pandas as pd
df1 = pd.DataFrame({'ethnicity': ['AMERICAN INDIAN/ALASKA NATIVE', 'WHITE - BRAZILIAN', 'WHITE-RUSSIAN','HISPANIC/LATINO - COLOMBIAN',
'HISPANIC/LATINO - MEXICAN','ASIAN','ASIAN - INDIAN','ASIAN - KOREAN','PORTUGUESE','MIDDLE-EASTERN','UNKNOWN',
'USER DECLINED','OTHERS']})
我想替换种族列值。例如:如果值为 ASIAN - INDIAN
,我想将其替换为 ASIAN
。
同样,我想对包含 AMERICAN
、WHITE
、HISPANIC
的字符串进行替换,其余的替换为 others
。这就是我正在尝试的
df1.loc[df.ethnicity.str.contains('WHITE'),'ethnicity'] = "WHITE"
df1.loc[df.ethnicity.str.contains('ASIAN'),'ethnicity'] = "ASIAN"
df1.loc[df.ethnicity.str.contains('HISPANIC'),'ethnicity'] = "HISPANIC"
df1.loc[df.ethnicity.str.contains('AMERICAN'),'ethnicity'] = "AMERICAN"
df1.loc[df.ethnicity.str.contains(other ethnicities),ethnicity] = "Others" # please note here I don't know how to replace all other ethnicities at once as others
我希望我的输出如下所示
使用Series.str.extract
by values of lists and for match is returned NaN
s, so add Series.fillna
:
L = ['WHITE','ASIAN','HISPANIC','AMERICAN']
print (f'({"|".join(L)})')
(WHITE|ASIAN|HISPANIC|AMERICAN)
df1.ethnicity = df1.ethnicity.str.extract(f'({"|".join(L)})', expand=False).fillna('Others')
或者您可以连接字符串中的值:
df1.ethnicity = (df1.ethnicity.str.extract('(WHITE|ASIAN|AMERICAN|HISPANIC)', expand=False)
.fillna('Others'))
print (df1)
ethnicity
0 AMERICAN
1 WHITE
2 WHITE
3 HISPANIC
4 HISPANIC
5 ASIAN
6 ASIAN
7 ASIAN
8 Others
9 Others
10 Others
11 Others
12 Others
我有一个如下所示的数据框
import pandas as pd
df1 = pd.DataFrame({'ethnicity': ['AMERICAN INDIAN/ALASKA NATIVE', 'WHITE - BRAZILIAN', 'WHITE-RUSSIAN','HISPANIC/LATINO - COLOMBIAN',
'HISPANIC/LATINO - MEXICAN','ASIAN','ASIAN - INDIAN','ASIAN - KOREAN','PORTUGUESE','MIDDLE-EASTERN','UNKNOWN',
'USER DECLINED','OTHERS']})
我想替换种族列值。例如:如果值为 ASIAN - INDIAN
,我想将其替换为 ASIAN
。
同样,我想对包含 AMERICAN
、WHITE
、HISPANIC
的字符串进行替换,其余的替换为 others
。这就是我正在尝试的
df1.loc[df.ethnicity.str.contains('WHITE'),'ethnicity'] = "WHITE"
df1.loc[df.ethnicity.str.contains('ASIAN'),'ethnicity'] = "ASIAN"
df1.loc[df.ethnicity.str.contains('HISPANIC'),'ethnicity'] = "HISPANIC"
df1.loc[df.ethnicity.str.contains('AMERICAN'),'ethnicity'] = "AMERICAN"
df1.loc[df.ethnicity.str.contains(other ethnicities),ethnicity] = "Others" # please note here I don't know how to replace all other ethnicities at once as others
我希望我的输出如下所示
使用Series.str.extract
by values of lists and for match is returned NaN
s, so add Series.fillna
:
L = ['WHITE','ASIAN','HISPANIC','AMERICAN']
print (f'({"|".join(L)})')
(WHITE|ASIAN|HISPANIC|AMERICAN)
df1.ethnicity = df1.ethnicity.str.extract(f'({"|".join(L)})', expand=False).fillna('Others')
或者您可以连接字符串中的值:
df1.ethnicity = (df1.ethnicity.str.extract('(WHITE|ASIAN|AMERICAN|HISPANIC)', expand=False)
.fillna('Others'))
print (df1)
ethnicity
0 AMERICAN
1 WHITE
2 WHITE
3 HISPANIC
4 HISPANIC
5 ASIAN
6 ASIAN
7 ASIAN
8 Others
9 Others
10 Others
11 Others
12 Others