在 for/if-else 循环中填充 np.nan 条件
Fill np.nan condition within for/if-else loop
我已经为此工作了一段时间,但似乎找不到我需要的答案。假设我有如下数据框。
我想做的是根据 df['home_work']
列中的值填充 df['gender']
的最后三行,特别是如果 home_work
> 9,则 m
,如果没有,则f
。请记住,这只是一个编造的数据集,我保证没有冒犯任何人的意思!
enr = pd.DataFrame({'name_id':[1254, 1359, 1254, 1296, 1353, 2656],
'enrollment_term':['spring 2018', 'spring 2018', 'fall 2018', 'spring 2018', 'spring 2018', 'fall 2020'],
'gpa_term': [2.93, np.nan, 1.65, 4.00, 3.95, 2.92],
'dog_owner':[0,1,1,1, 1, 0],
'salary':[50657, 90658, np.nan, 104352, np.nan, 102043],
'home_work':[34, np.nan, 12, 9, 8, 27],
'gender':['m','f','f',np.nan, np.nan, np.nan]})
enr
下面是我尝试的代码,但它在下面显示了错误:
for i in df['gender'].isna():
if df['home_work'][i] > 9:
df['gender'][i].fillna('m')
else:
df['gender'][i].fillna('f')
KeyError: False
非常感谢任何帮助,因为我已经为此工作了一段时间。我有一个 90K + 的数据集,我想调整这项工作,并想创建一个函数来简化这个过程,但遇到了速度障碍!
我运行遇到的问题是np.nan
默认,如果不符合要求就给gender
补一个值。想法?
# 已编辑
假设我有以下 df:
enr = pd.DataFrame({'name_id':[1254, 1359, 1254, 1296, 1353, 2656],
'enrollment_term':['spring 2018', 'spring 2018', 'fall 2018', 'spring 2018', 'spring 2018', 'fall 2020'],
'gpa_term': [2.93, np.nan, 1.65, 4.00, 3.95, 2.92],
'dog_owner':[0,1,1,1, 1, 0],
'salary':[50657, 90658, np.nan, 104352, np.nan, 102043],
'home_work':[np.nan, np.nan, 0.7, 0.3, 0.64, 0.49],
'gender':[0, 1, 1,np.nan, np.nan, np.nan]})
我想根据 home_work
估算 enr['gender']
。如果enr['home_work'] >= 0.5
,则enr['gender'] == 0
,否则(只要enr['home_work'] != np.nan
),enr['gender'] == 1
。
我不想要的是 enr[gender]
中的值插补,其中 enr['home_work']
是 np.nan
我尝试了很多不同的技术,但似乎都插补了 1。有什么想法吗?
使用numpy.where
with Series.fillna
:
enr['gender'] = np.where(enr['home_work'] > 9,
enr['gender'].fillna('m'),
enr['gender'].fillna('f'))
或分别过滤2个掩码:
m = enr['gender'].isna()
enr.loc[m, 'gender'] = np.where(enr['home_work'] > 9, 'm', 'f')[m]
print (enr)
name_id enrollment_term gpa_term dog_owner salary home_work gender
0 1254 spring 2018 2.93 0 50657.0 34 m
1 1359 spring 2018 NaN 1 90658.0 42 f
2 1254 fall 2018 1.65 1 NaN 12 f
3 1296 spring 2018 4.00 1 104352.0 9 f
4 1353 spring 2018 3.95 1 NaN 8 f
5 2656 fall 2020 2.92 0 102043.0 27 m
编辑:
m = enr['gender'].isna() & enr['home_work'].notna()
enr.loc[m, 'gender'] = np.where(enr['home_work'] >= 0.5, 0, 1)[m]
print (enr)
name_id enrollment_term gpa_term dog_owner salary home_work gender
0 1254 spring 2018 2.93 0 50657.0 NaN 0.0
1 1359 spring 2018 NaN 1 90658.0 NaN 1.0
2 1254 fall 2018 1.65 1 NaN 0.70 1.0
3 1296 spring 2018 4.00 1 104352.0 0.30 1.0
4 1353 spring 2018 3.95 1 NaN 0.64 0.0
5 2656 fall 2020 2.92 0 102043.0 0.49 1.0
让我们尝试 map
值和 where
df.gender=df.gender.where(df.gender.notna(),df.home_work.gt(9).map({True:'m',False:'f'}))
df
name_id enrollment_term gpa_term dog_owner salary home_work gender
0 1254 spring 2018 2.93 0 50657.0 34.0 m
1 1359 spring 2018 NaN 1 90658.0 NaN f
2 1254 fall 2018 1.65 1 NaN 12.0 f
3 1296 spring 2018 4.00 1 104352.0 9.0 f
4 1353 spring 2018 3.95 1 NaN 8.0 f
5 2656 fall 2020 2.92 0 102043.0 27.0 m
我已经为此工作了一段时间,但似乎找不到我需要的答案。假设我有如下数据框。
我想做的是根据 df['home_work']
列中的值填充 df['gender']
的最后三行,特别是如果 home_work
> 9,则 m
,如果没有,则f
。请记住,这只是一个编造的数据集,我保证没有冒犯任何人的意思!
enr = pd.DataFrame({'name_id':[1254, 1359, 1254, 1296, 1353, 2656],
'enrollment_term':['spring 2018', 'spring 2018', 'fall 2018', 'spring 2018', 'spring 2018', 'fall 2020'],
'gpa_term': [2.93, np.nan, 1.65, 4.00, 3.95, 2.92],
'dog_owner':[0,1,1,1, 1, 0],
'salary':[50657, 90658, np.nan, 104352, np.nan, 102043],
'home_work':[34, np.nan, 12, 9, 8, 27],
'gender':['m','f','f',np.nan, np.nan, np.nan]})
enr
下面是我尝试的代码,但它在下面显示了错误:
for i in df['gender'].isna():
if df['home_work'][i] > 9:
df['gender'][i].fillna('m')
else:
df['gender'][i].fillna('f')
KeyError: False
非常感谢任何帮助,因为我已经为此工作了一段时间。我有一个 90K + 的数据集,我想调整这项工作,并想创建一个函数来简化这个过程,但遇到了速度障碍!
我运行遇到的问题是np.nan
默认,如果不符合要求就给gender
补一个值。想法?
# 已编辑
假设我有以下 df:
enr = pd.DataFrame({'name_id':[1254, 1359, 1254, 1296, 1353, 2656],
'enrollment_term':['spring 2018', 'spring 2018', 'fall 2018', 'spring 2018', 'spring 2018', 'fall 2020'],
'gpa_term': [2.93, np.nan, 1.65, 4.00, 3.95, 2.92],
'dog_owner':[0,1,1,1, 1, 0],
'salary':[50657, 90658, np.nan, 104352, np.nan, 102043],
'home_work':[np.nan, np.nan, 0.7, 0.3, 0.64, 0.49],
'gender':[0, 1, 1,np.nan, np.nan, np.nan]})
我想根据 home_work
估算 enr['gender']
。如果enr['home_work'] >= 0.5
,则enr['gender'] == 0
,否则(只要enr['home_work'] != np.nan
),enr['gender'] == 1
。
我不想要的是 enr[gender]
中的值插补,其中 enr['home_work']
是 np.nan
我尝试了很多不同的技术,但似乎都插补了 1。有什么想法吗?
使用numpy.where
with Series.fillna
:
enr['gender'] = np.where(enr['home_work'] > 9,
enr['gender'].fillna('m'),
enr['gender'].fillna('f'))
或分别过滤2个掩码:
m = enr['gender'].isna()
enr.loc[m, 'gender'] = np.where(enr['home_work'] > 9, 'm', 'f')[m]
print (enr)
name_id enrollment_term gpa_term dog_owner salary home_work gender
0 1254 spring 2018 2.93 0 50657.0 34 m
1 1359 spring 2018 NaN 1 90658.0 42 f
2 1254 fall 2018 1.65 1 NaN 12 f
3 1296 spring 2018 4.00 1 104352.0 9 f
4 1353 spring 2018 3.95 1 NaN 8 f
5 2656 fall 2020 2.92 0 102043.0 27 m
编辑:
m = enr['gender'].isna() & enr['home_work'].notna()
enr.loc[m, 'gender'] = np.where(enr['home_work'] >= 0.5, 0, 1)[m]
print (enr)
name_id enrollment_term gpa_term dog_owner salary home_work gender
0 1254 spring 2018 2.93 0 50657.0 NaN 0.0
1 1359 spring 2018 NaN 1 90658.0 NaN 1.0
2 1254 fall 2018 1.65 1 NaN 0.70 1.0
3 1296 spring 2018 4.00 1 104352.0 0.30 1.0
4 1353 spring 2018 3.95 1 NaN 0.64 0.0
5 2656 fall 2020 2.92 0 102043.0 0.49 1.0
让我们尝试 map
值和 where
df.gender=df.gender.where(df.gender.notna(),df.home_work.gt(9).map({True:'m',False:'f'}))
df
name_id enrollment_term gpa_term dog_owner salary home_work gender
0 1254 spring 2018 2.93 0 50657.0 34.0 m
1 1359 spring 2018 NaN 1 90658.0 NaN f
2 1254 fall 2018 1.65 1 NaN 12.0 f
3 1296 spring 2018 4.00 1 104352.0 9.0 f
4 1353 spring 2018 3.95 1 NaN 8.0 f
5 2656 fall 2020 2.92 0 102043.0 27.0 m