Pandas 当列值匹配时,Dataframe 从行中替换 Nan
Pandas Dataframe replace Nan from a row when a column value matches
我有数据框,即
Input Dataframe
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 Nan salem
3 III A Eng 80 gphss Nan
4 III A Mat 45 Nan salem
5 III A Eng 40 gphss Nan
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss Nan
当 "class" 和 "section" 列中的值匹配时,我需要替换 "school" 和 "city" 中的 "Nan"。结果应该是,
输入数据帧
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
谁能帮我解决这个问题?
假设每对class
和section
对应一对唯一的school
和city
,我们可以使用groupby
:
# create a dictionary of class and section with school and city
# here we assume that for each pair and class there's a row with both school and city
# if that's not the case, we can separate the two series
school_city_dict = df[['class', 'section','school','city']].dropna().\
groupby(['class', 'section'])[['school','city']].\
max().to_dict()
# school_city_dict = {'school': {('I', 'A'): 'jghss', ('III', 'A'): 'gphss'},
# 'city': {('I', 'A'): 'salem', ('III', 'A'): 'salem'}}
# set index, prepare for map function
df.set_index(['class','section'], inplace=True)
df.loc[:,'school'] = df.index.map(school_city_dict['school'])
df.loc[:,'city'] = df.index.map(school_city_dict['city'])
# reset index to the original
df.reset_index()
在 DataFrame.groupby
列表中指定的列中使用 lambda function
每组正向和反向填充缺失值 - 对于每组相同值的每个组合是必要的:
cols = ['school','city']
df[cols] = df.groupby(['class','section'])[cols].apply(lambda x: x.ffill().bfill())
print (df)
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
我有数据框,即
Input Dataframe
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 Nan salem
3 III A Eng 80 gphss Nan
4 III A Mat 45 Nan salem
5 III A Eng 40 gphss Nan
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss Nan
当 "class" 和 "section" 列中的值匹配时,我需要替换 "school" 和 "city" 中的 "Nan"。结果应该是, 输入数据帧
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem
谁能帮我解决这个问题?
假设每对class
和section
对应一对唯一的school
和city
,我们可以使用groupby
:
# create a dictionary of class and section with school and city
# here we assume that for each pair and class there's a row with both school and city
# if that's not the case, we can separate the two series
school_city_dict = df[['class', 'section','school','city']].dropna().\
groupby(['class', 'section'])[['school','city']].\
max().to_dict()
# school_city_dict = {'school': {('I', 'A'): 'jghss', ('III', 'A'): 'gphss'},
# 'city': {('I', 'A'): 'salem', ('III', 'A'): 'salem'}}
# set index, prepare for map function
df.set_index(['class','section'], inplace=True)
df.loc[:,'school'] = df.index.map(school_city_dict['school'])
df.loc[:,'city'] = df.index.map(school_city_dict['city'])
# reset index to the original
df.reset_index()
在 DataFrame.groupby
列表中指定的列中使用 lambda function
每组正向和反向填充缺失值 - 对于每组相同值的每个组合是必要的:
cols = ['school','city']
df[cols] = df.groupby(['class','section'])[cols].apply(lambda x: x.ffill().bfill())
print (df)
class section sub marks school city
0 I A Eng 80 jghss salem
1 I A Mat 90 jghss salem
2 I A Eng 50 jghss salem
3 III A Eng 80 gphss salem
4 III A Mat 45 gphss salem
5 III A Eng 40 gphss salem
6 III A Eng 20 gphss salem
7 III A Mat 55 gphss salem