不要将 lambda 函数应用于缺失值
Do not apply lambda function to missing values
我有一个列中包含患者诊断的数据框,并使用 pandas 我想对诊断进行二分 ==> ISM,非 ISM。我试过这个
df["initial_diagnosis"] = df["initial_diagnosis"].apply(lambda x: x if x=="ISM" else "non ISM")
但它也将“非 ISM”分配给缺失值。有没有办法做同样的事情并保持缺失值不变?
我尝试编码的专栏如下所示:
initial_diagnosis I
ISM
ISM
WDSM
NaN
ISM
SSM
CM
ASM
ISM
我认为它应该有效。
可能缺失值是空字符串或者只是None,我只能猜测
missing_values = {...} # Set of values you want to keep
df["initial_diagnosis"] = df["initial_diagnosis"].apply(lambda x: x if x=="ISM" or x in missing_values else "non ISM")
编辑:
import pandas as pd
from numpy import nan
data = pd.read_csv("test.csv")
print(data['initial_diagnosis'])
#0 ISM
#1 ISM
#2 WDSM
#3 NaN
#4 ISM
#5 SSM
#6 CM
#7 ASM
#8 ISM
#Name: initial_diagnosis, dtype: object
missing_values = {nan}
data["initial_diagnosis"] = data["initial_diagnosis"].apply(lambda x: x if x =="ISM" or x in missing_values else "non ISM")
print(data['initial_diagnosis'])
#0 non ISM
#1 ISM
#2 non ISM
#3 NaN
#4 ISM
#5 non ISM
#6 non ISM
#7 non ISM
#8 ISM
我有一个列中包含患者诊断的数据框,并使用 pandas 我想对诊断进行二分 ==> ISM,非 ISM。我试过这个
df["initial_diagnosis"] = df["initial_diagnosis"].apply(lambda x: x if x=="ISM" else "non ISM")
但它也将“非 ISM”分配给缺失值。有没有办法做同样的事情并保持缺失值不变?
我尝试编码的专栏如下所示:
initial_diagnosis I
ISM
ISM
WDSM
NaN
ISM
SSM
CM
ASM
ISM
我认为它应该有效。 可能缺失值是空字符串或者只是None,我只能猜测
missing_values = {...} # Set of values you want to keep
df["initial_diagnosis"] = df["initial_diagnosis"].apply(lambda x: x if x=="ISM" or x in missing_values else "non ISM")
编辑:
import pandas as pd
from numpy import nan
data = pd.read_csv("test.csv")
print(data['initial_diagnosis'])
#0 ISM
#1 ISM
#2 WDSM
#3 NaN
#4 ISM
#5 SSM
#6 CM
#7 ASM
#8 ISM
#Name: initial_diagnosis, dtype: object
missing_values = {nan}
data["initial_diagnosis"] = data["initial_diagnosis"].apply(lambda x: x if x =="ISM" or x in missing_values else "non ISM")
print(data['initial_diagnosis'])
#0 non ISM
#1 ISM
#2 non ISM
#3 NaN
#4 ISM
#5 non ISM
#6 non ISM
#7 non ISM
#8 ISM