Pandas 风险调整手术数据——20000名患者的名单如何通过?
Pandas to risk adjust surgery data - how to pass a list for 20,000 patients?
我是一名外科医生,尝试使用hcuppy风险调整手术操作数据。我有一个 pandas 数据框——50 万条记录和 56 列。调用调整模块(elixhauser)需要诊断代码列表和returns与诊断代码相关的风险调整的加权结果。
例如
In[] : from hcuppy.elixhauser import ElixhauserEngine
In[] : ee = ElixhauserEngine()
In[] : out = ee.get_elixhauser(["C711"])
In [] : out
Out[]: {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mrtlt_scr': 7}
太好了,我得到了患脑瘤的风险调整。但是,我对每位患者最多有 20 次诊断,并且希望 运行 对最多 20,000 名患者的患者队列进行每次诊断的转换。目前,我将其限制为如下所示的 4 个。
df.iloc[0:10, 7:11]
DIAG_02 DIAG_03 DIAG_04 DIAG_05
0 NaN NaN NaN NaN
1 M7962 NaN NaN NaN
2 G800 Q798 NaN NaN
3 G992-A M4720D G551-A F101-
4 I10X NaN NaN NaN
5 G971 G35X Z881 N390
6 Z864- NaN NaN NaN
7 F329 NaN NaN NaN
8 Z992- E669- K219- I10X-
9 M510 G992 M4806 I10X
如果我通过
out = ee.get_elixhauser([df.iloc[0:, 7:11]])
我明白了
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-78-83647b38b5dc> in <module>
----> 1 out = ee.get_elixhauser([df.iloc[0:, 6:11]])
/opt/anaconda3/lib/python3.7/site-packages/hcuppy/elixhauser.py in get_elixhauser(self, dx_lst)
93 dx_lst = [dx_lst]
94
---> 95 dx_set = {dx.strip().upper().replace(".","") for dx in dx_lst}
96 rawgrp_lst = [grp for grp in {search(dx) for dx in dx_set}
97 if grp not in {"", "NONE"}]
/opt/anaconda3/lib/python3.7/site-packages/hcuppy/elixhauser.py in <setcomp>(.0)
93 dx_lst = [dx_lst]
94
---> 95 dx_set = {dx.strip().upper().replace(".","") for dx in dx_lst}
96 rawgrp_lst = [grp for grp in {search(dx) for dx in dx_set}
97 if grp not in {"", "NONE"}]
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5128 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5129 return self[name]
-> 5130 return object.__getattribute__(self, name)
5131
5132 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'strip'
我希望将风险调整输出添加为单独的列。如您所见,我是一个新手,所以需要简单的解决方案。
根据下面的冻糕评论更新。
运行
def clean_get_elixhauser(val):
try:
result = ee.get_elixhauser(val)
except Exception as e:
result = e # RETURN EXCEPTION MESSAGE - ADJUST AS NEEDED
return result if pd.isnull(val) else float('nan')
new_df = df.iloc[0:10, 7:11].copy().applymap(clean_get_elixhauser)
在同一数据帧上 returns、
DIAG_02 DIAG_03 DIAG_04 DIAG_05
0 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
1 NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
2 NaN NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
3 NaN NaN NaN NaN
4 NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
5 NaN NaN NaN NaN
6 NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
7 NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
8 NaN NaN NaN NaN
9 NaN NaN NaN NaN
因此,在原始数据框中我有 NaN 的地方 - 'float' 对象没有属性 'strip' 被返回,在我有代码的地方我得到 NaN。我不确定这里发生了什么。
因为 ee.get_elixhauser
不是接收整个数据帧或数组作为输入参数的方法,而是接收标量值,因此您需要 运行 方法跨越数据的每个元素框架。因此,假设所有数据框元素都是同一类型,请考虑 DataFrame.applymap
。此外,为了避免由于 iloc
而导致的切片分配问题,请在调用方法之前使用 copy()
。
new_df = df.iloc[0:10, 7:11].copy().applymap(ee.get_elixhauser)
此外,您可能需要处理 NaN
并且如果 ee.get_elixhauser
引发异常:
def clean_get_elixhauser(val):
try:
result = ee.get_elixhauser([val])
except Exception as e:
result = e # RETURN EXCEPTION MESSAGE - ADJUST AS NEEDED
return result if pd.notnull(val) else float('nan')
new_df = df.iloc[0:10, 7:11].copy().applymap(clean_get_elixhauser)
另一个挑战是方法 returns 一个字典,它将被映射为返回数据框中的新元素。 (注意:下面不使用实际模块 hcuppy.elixhauser
调用,而是 OP 发布的演示结果,以避免在我这边安装模块)。
def demo_get_elixhauser(val):
result = {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mrtlt_scr': 7}
return result if pd.notnull(val) else float('nan')
new_df = df.applymap(demo_get_elixhauser)
new_df
# DIAG_02 ... DIAG_05
# 0 NaN ... NaN
# 1 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 2 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 3 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr...
# 4 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 5 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr...
# 6 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 7 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 8 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr...
# 9 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr...
要修复,您可以通过调整用户定义的方法来检索返回字典的一个元素
def clean_get_elixhauser(val):
result = {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mrtlt_scr': 7}
return result['mrtlt_scr'] if pd.notnull(val) else float('nan')
mrtlt_df = df.applymap(clean_get_elixhauser)
mrtlt_df
# DIAG_02 DIAG_03 DIAG_04 DIAG_05
# 0 NaN NaN NaN NaN
# 1 7.0 NaN NaN NaN
# 2 7.0 7.0 NaN NaN
# 3 7.0 7.0 7.0 7.0
# 4 7.0 NaN NaN NaN
# 5 7.0 7.0 7.0 7.0
# 6 7.0 NaN NaN NaN
# 7 7.0 NaN NaN NaN
# 8 7.0 7.0 7.0 7.0
# 9 7.0 7.0 7.0 7.0
我是一名外科医生,尝试使用hcuppy风险调整手术操作数据。我有一个 pandas 数据框——50 万条记录和 56 列。调用调整模块(elixhauser)需要诊断代码列表和returns与诊断代码相关的风险调整的加权结果。
例如
In[] : from hcuppy.elixhauser import ElixhauserEngine
In[] : ee = ElixhauserEngine()
In[] : out = ee.get_elixhauser(["C711"])
In [] : out
Out[]: {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mrtlt_scr': 7}
太好了,我得到了患脑瘤的风险调整。但是,我对每位患者最多有 20 次诊断,并且希望 运行 对最多 20,000 名患者的患者队列进行每次诊断的转换。目前,我将其限制为如下所示的 4 个。
df.iloc[0:10, 7:11]
DIAG_02 DIAG_03 DIAG_04 DIAG_05
0 NaN NaN NaN NaN
1 M7962 NaN NaN NaN
2 G800 Q798 NaN NaN
3 G992-A M4720D G551-A F101-
4 I10X NaN NaN NaN
5 G971 G35X Z881 N390
6 Z864- NaN NaN NaN
7 F329 NaN NaN NaN
8 Z992- E669- K219- I10X-
9 M510 G992 M4806 I10X
如果我通过
out = ee.get_elixhauser([df.iloc[0:, 7:11]])
我明白了
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-78-83647b38b5dc> in <module>
----> 1 out = ee.get_elixhauser([df.iloc[0:, 6:11]])
/opt/anaconda3/lib/python3.7/site-packages/hcuppy/elixhauser.py in get_elixhauser(self, dx_lst)
93 dx_lst = [dx_lst]
94
---> 95 dx_set = {dx.strip().upper().replace(".","") for dx in dx_lst}
96 rawgrp_lst = [grp for grp in {search(dx) for dx in dx_set}
97 if grp not in {"", "NONE"}]
/opt/anaconda3/lib/python3.7/site-packages/hcuppy/elixhauser.py in <setcomp>(.0)
93 dx_lst = [dx_lst]
94
---> 95 dx_set = {dx.strip().upper().replace(".","") for dx in dx_lst}
96 rawgrp_lst = [grp for grp in {search(dx) for dx in dx_set}
97 if grp not in {"", "NONE"}]
/opt/anaconda3/lib/python3.7/site-packages/pandas/core/generic.py in __getattr__(self, name)
5128 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5129 return self[name]
-> 5130 return object.__getattribute__(self, name)
5131
5132 def __setattr__(self, name: str, value) -> None:
AttributeError: 'DataFrame' object has no attribute 'strip'
我希望将风险调整输出添加为单独的列。如您所见,我是一个新手,所以需要简单的解决方案。
根据下面的冻糕评论更新。
运行
def clean_get_elixhauser(val):
try:
result = ee.get_elixhauser(val)
except Exception as e:
result = e # RETURN EXCEPTION MESSAGE - ADJUST AS NEEDED
return result if pd.isnull(val) else float('nan')
new_df = df.iloc[0:10, 7:11].copy().applymap(clean_get_elixhauser)
在同一数据帧上 returns、
DIAG_02 DIAG_03 DIAG_04 DIAG_05
0 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
1 NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
2 NaN NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
3 NaN NaN NaN NaN
4 NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
5 NaN NaN NaN NaN
6 NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
7 NaN 'float' object has no attribute 'strip' 'float' object has no attribute 'strip' 'float' object has no attribute 'strip'
8 NaN NaN NaN NaN
9 NaN NaN NaN NaN
因此,在原始数据框中我有 NaN 的地方 - 'float' 对象没有属性 'strip' 被返回,在我有代码的地方我得到 NaN。我不确定这里发生了什么。
因为 ee.get_elixhauser
不是接收整个数据帧或数组作为输入参数的方法,而是接收标量值,因此您需要 运行 方法跨越数据的每个元素框架。因此,假设所有数据框元素都是同一类型,请考虑 DataFrame.applymap
。此外,为了避免由于 iloc
而导致的切片分配问题,请在调用方法之前使用 copy()
。
new_df = df.iloc[0:10, 7:11].copy().applymap(ee.get_elixhauser)
此外,您可能需要处理 NaN
并且如果 ee.get_elixhauser
引发异常:
def clean_get_elixhauser(val):
try:
result = ee.get_elixhauser([val])
except Exception as e:
result = e # RETURN EXCEPTION MESSAGE - ADJUST AS NEEDED
return result if pd.notnull(val) else float('nan')
new_df = df.iloc[0:10, 7:11].copy().applymap(clean_get_elixhauser)
另一个挑战是方法 returns 一个字典,它将被映射为返回数据框中的新元素。 (注意:下面不使用实际模块 hcuppy.elixhauser
调用,而是 OP 发布的演示结果,以避免在我这边安装模块)。
def demo_get_elixhauser(val):
result = {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mrtlt_scr': 7}
return result if pd.notnull(val) else float('nan')
new_df = df.applymap(demo_get_elixhauser)
new_df
# DIAG_02 ... DIAG_05
# 0 NaN ... NaN
# 1 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 2 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 3 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr...
# 4 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 5 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr...
# 6 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 7 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... NaN
# 8 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr...
# 9 {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr... ... {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mr...
要修复,您可以通过调整用户定义的方法来检索返回字典的一个元素
def clean_get_elixhauser(val):
result = {'cmrbdt_lst': ['TUMOR'], 'rdmsn_scr': 15, 'mrtlt_scr': 7}
return result['mrtlt_scr'] if pd.notnull(val) else float('nan')
mrtlt_df = df.applymap(clean_get_elixhauser)
mrtlt_df
# DIAG_02 DIAG_03 DIAG_04 DIAG_05
# 0 NaN NaN NaN NaN
# 1 7.0 NaN NaN NaN
# 2 7.0 7.0 NaN NaN
# 3 7.0 7.0 7.0 7.0
# 4 7.0 NaN NaN NaN
# 5 7.0 7.0 7.0 7.0
# 6 7.0 NaN NaN NaN
# 7 7.0 NaN NaN NaN
# 8 7.0 7.0 7.0 7.0
# 9 7.0 7.0 7.0 7.0