在 pandas 列中替换多个术语的优雅而有效的方法
Elegant and efficient way to replace multiple terms in a pandas column
我想替换数据框列中的多个值,如下所示
df['label'] = ['Sodium', 'Bicarbonate', 'White Blood Cells', 'Hemoglobin',
'Glucose', 'Lactate', 'pH', 'Potassium, Whole Blood',
'Sodium, Whole Blood', 'Lactate Dehydrogenase (LD)',
'Bilirubin, Direct', 'Alkaline Phosphatase',
'Alanine Aminotransferase (ALT)',
'Asparate Aminotransferase (AST)', 'Potassium', 'Phosphate',
'Creatinine', 'C-Reactive Protein', 'pCO2',
'Calculated Bicarbonate, Whole Blood', 'Bilirubin, Total',
'Albumin', 'Bilirubin, Indirect', 'Urine Volume', 'WBC Count',
'Urine Volume, Total', 'Phosphate, Body Fluid']
在下面的代码中,我试图用 Sodium
.
替换 Sodium
和 Sodium, Whole Blood
同样,我对其余的测量也做同样的事情
df['label'] = df['label'].replace(dict.fromkeys(['Sodium','Sodium, Whole Blood'], 'Sodium'))
df['label'] = df['label'].replace(dict.fromkeys(['Bicarbonate','Calculated Bicarbonate, Whole Blood'], 'Bicarbonate'))
df['label'] = df['label'].replace(dict.fromkeys(['Bicarbonate','Bilirubin, Indirect'], 'Bicarbonate'))
df['label'] = df['label'].replace(dict.fromkeys(['Bilirubin, Direct','Bilirubin, Total','Calculated Bicarbonate, Whole Blood'], 'Bilirubin'))
df['label'] = df['label'].replace(dict.fromkeys(['Urine Volume, Total','Urine Volume'], 'Urine Volume'))
df['label'] = df['label'].replace(dict.fromkeys(['White Blood Cells','WBC Count'], 'WBC'))
df['label'] = df['label'].replace(dict.fromkeys(['Potassium, Whole Blood','Potassium'], 'Potassium'))
df['label'] = df['label'].replace(dict.fromkeys(['Phosphate','Phosphate, Body Fluid'], 'Phosphate'))
虽然上面的代码工作得很好,但是有没有其他有效的方法来有效地替换而不是多次重复同一行代码?
一种方法是创建大字典并替换一次:
# add more of your stuff here
lst = [(['Sodium','Sodium, Whole Blood'], 'Sodium'),
(['Bicarbonate','Calculated Bicarbonate, Whole Blood'], 'Bicarbonate')
]
repl_dict = {}
for x,y in lst:
repl_dict.update(dict.fromkeys(x,y))
df['label'] = df['label'].replace(repl_dict)
我想替换数据框列中的多个值,如下所示
df['label'] = ['Sodium', 'Bicarbonate', 'White Blood Cells', 'Hemoglobin',
'Glucose', 'Lactate', 'pH', 'Potassium, Whole Blood',
'Sodium, Whole Blood', 'Lactate Dehydrogenase (LD)',
'Bilirubin, Direct', 'Alkaline Phosphatase',
'Alanine Aminotransferase (ALT)',
'Asparate Aminotransferase (AST)', 'Potassium', 'Phosphate',
'Creatinine', 'C-Reactive Protein', 'pCO2',
'Calculated Bicarbonate, Whole Blood', 'Bilirubin, Total',
'Albumin', 'Bilirubin, Indirect', 'Urine Volume', 'WBC Count',
'Urine Volume, Total', 'Phosphate, Body Fluid']
在下面的代码中,我试图用 Sodium
.
Sodium
和 Sodium, Whole Blood
同样,我对其余的测量也做同样的事情
df['label'] = df['label'].replace(dict.fromkeys(['Sodium','Sodium, Whole Blood'], 'Sodium'))
df['label'] = df['label'].replace(dict.fromkeys(['Bicarbonate','Calculated Bicarbonate, Whole Blood'], 'Bicarbonate'))
df['label'] = df['label'].replace(dict.fromkeys(['Bicarbonate','Bilirubin, Indirect'], 'Bicarbonate'))
df['label'] = df['label'].replace(dict.fromkeys(['Bilirubin, Direct','Bilirubin, Total','Calculated Bicarbonate, Whole Blood'], 'Bilirubin'))
df['label'] = df['label'].replace(dict.fromkeys(['Urine Volume, Total','Urine Volume'], 'Urine Volume'))
df['label'] = df['label'].replace(dict.fromkeys(['White Blood Cells','WBC Count'], 'WBC'))
df['label'] = df['label'].replace(dict.fromkeys(['Potassium, Whole Blood','Potassium'], 'Potassium'))
df['label'] = df['label'].replace(dict.fromkeys(['Phosphate','Phosphate, Body Fluid'], 'Phosphate'))
虽然上面的代码工作得很好,但是有没有其他有效的方法来有效地替换而不是多次重复同一行代码?
一种方法是创建大字典并替换一次:
# add more of your stuff here
lst = [(['Sodium','Sodium, Whole Blood'], 'Sodium'),
(['Bicarbonate','Calculated Bicarbonate, Whole Blood'], 'Bicarbonate')
]
repl_dict = {}
for x,y in lst:
repl_dict.update(dict.fromkeys(x,y))
df['label'] = df['label'].replace(repl_dict)