涉及列中混合类型元素的计算 - np.nan、浮点数、字符串元素
Calculations involving mixed type elements in columns - np.nan, float, string elements
我有一个数据框,其中的列包含混合类型的元素,我需要对它们进行一些计算。假设这个数据框:
A=[20, np.nan, 10, 'give', np.nan, np.nan]
B=[10, np.nan, np.nan, np.nan, 10, 'given']
frame=pd.DataFrame(zip(A,B))
frame.columns=['A', 'B']
我想填充 B 与 A 的差异。如果我这样做 frame['diff']=frame['A']-frame['B']
它不会给出我需要的结果。相反,我想要的结果在 'desired diff' 列中。
基本上,如果A或B有数字,则B或A应为0。如果字符串在A中,而B为NaN,则应写成"positive",反之亦然case,应该写成"negative"。见下文:
frame
A B diff desired diff
0 20 10 10 10
1 NaN NaN NaN NaN
2 10 NaN NaN 10
3 give NaN NaN positive
4 NaN 10 NaN -10
5 NaN given NaN negative
仅作记录,我已尝试实现 np.where
和 np.select
以及一些条件,例如 np.logical_and(frame['A'].apply(lambda x: isinstance(x, float)), frame['B'].isna())
以实现所需的输出,但没有成功。
提前感谢您的建议!
使用 to_numeric
with errors='coerce'
for check non numeric and no missing values and set new values by numpy.select
and subtract values by Series.sub
和 fill_value=0
参数:
a = pd.to_numeric(frame['A'], errors='coerce')
m1 = frame['A'].notna()
m2 = a.isna()
b = pd.to_numeric(frame['B'], errors='coerce')
m3 = frame['B'].notna()
m4 = b.isna()
frame['new'] = np.select([m1 & m2, m3 & m4],
['positive', 'negative'],
default = a.sub(b, fill_value=0))
print (frame)
A B new
0 20 10 10.0
1 NaN NaN nan
2 10 NaN 10.0
3 give NaN positive
4 NaN 10 -10.0
5 NaN given negative
如果你想使用长申请,我不推荐:
frame['diff'] = (frame.fillna(0)
.apply(lambda x: x.A-x.B if (isinstance(x.A, (int, float)) & isinstance(x.B, (int, float)))
else ('positive' if (isinstance(x.A, str) & (x.B == 0)) else 'negative'),
axis=1)
.replace(0, np.nan))
A B diff
0 20 10 10
1 NaN NaN NaN
2 10 NaN 10
3 give NaN positive
4 NaN 10 -10
5 NaN given negative
我有一个数据框,其中的列包含混合类型的元素,我需要对它们进行一些计算。假设这个数据框:
A=[20, np.nan, 10, 'give', np.nan, np.nan]
B=[10, np.nan, np.nan, np.nan, 10, 'given']
frame=pd.DataFrame(zip(A,B))
frame.columns=['A', 'B']
我想填充 B 与 A 的差异。如果我这样做 frame['diff']=frame['A']-frame['B']
它不会给出我需要的结果。相反,我想要的结果在 'desired diff' 列中。
基本上,如果A或B有数字,则B或A应为0。如果字符串在A中,而B为NaN,则应写成"positive",反之亦然case,应该写成"negative"。见下文:
frame
A B diff desired diff
0 20 10 10 10
1 NaN NaN NaN NaN
2 10 NaN NaN 10
3 give NaN NaN positive
4 NaN 10 NaN -10
5 NaN given NaN negative
仅作记录,我已尝试实现 np.where
和 np.select
以及一些条件,例如 np.logical_and(frame['A'].apply(lambda x: isinstance(x, float)), frame['B'].isna())
以实现所需的输出,但没有成功。
提前感谢您的建议!
使用 to_numeric
with errors='coerce'
for check non numeric and no missing values and set new values by numpy.select
and subtract values by Series.sub
和 fill_value=0
参数:
a = pd.to_numeric(frame['A'], errors='coerce')
m1 = frame['A'].notna()
m2 = a.isna()
b = pd.to_numeric(frame['B'], errors='coerce')
m3 = frame['B'].notna()
m4 = b.isna()
frame['new'] = np.select([m1 & m2, m3 & m4],
['positive', 'negative'],
default = a.sub(b, fill_value=0))
print (frame)
A B new
0 20 10 10.0
1 NaN NaN nan
2 10 NaN 10.0
3 give NaN positive
4 NaN 10 -10.0
5 NaN given negative
如果你想使用长申请,我不推荐:
frame['diff'] = (frame.fillna(0)
.apply(lambda x: x.A-x.B if (isinstance(x.A, (int, float)) & isinstance(x.B, (int, float)))
else ('positive' if (isinstance(x.A, str) & (x.B == 0)) else 'negative'),
axis=1)
.replace(0, np.nan))
A B diff
0 20 10 10
1 NaN NaN NaN
2 10 NaN 10
3 give NaN positive
4 NaN 10 -10
5 NaN given negative