无法遍历包含不规则嵌套列表的 pandas 列

Question

我已经注释了一些数据并将每个注释作为列表存储在 pd 数据框列 df['Annotations'] 中。但是，该文档可能有多个注释导致嵌套列表。

例如：

[[[过去，自己]，酒精]，[[现在，自己]，烟草]]

将是两个单独的注释，一个用于（过去、自我、酒精），另一个用于（现在、自我、烟草）

我很难遍历此列并根据每个注释的值更新其他列

我的数据框如下：


       User    Annotations                           Temp   Experiencer   Tobacco   MJ    Alc.

       'xyz'    [[[past, self], alcohol],           
                 [[present, self],tobacco]]            0         0           0       0      0

       'aaa'    [[[general], marijuana]]               0         0           0       0      0 

       'bbb'    [[[past, other], alcohol], 
                 [[future, other], marijuana]]         0         0           0       0      0

我希望生成的数据框包含每个子列表（注释）的一行。理想情况下，它看起来像下面的临时列（0 = none，1 = 过去，2 = 现在，3 = 未来）体验者列（0 = 一般，1 = 自我，2 = 其他），和其余列布尔值（1 表示存在，0 表示不存在）：

       User      Temp   Experiencer   Tobacco   MJ   Alc.

       'xyz'      1          1           0      0     1

       'xyz'      2          1           1      0     0

       'aaa'      0          0           0      1     0 

       'bbb'      1          2           0      0     1 
         
       'bbb'      3          2           0      1     0

有人对如何将此应用于整个数据框的列有任何建议吗？

谢谢！

Answer 1

代码有点冗长，一些列名称不同，但它有效。

import pandas as pd

col_map = {
    'alcohol': 'Alc',
    'tobacco': 'Tobacco',
    'marijuana': 'MJ',
    'past': 1,
    'present': 2,
    'future': 3
}
col_map2 = {'general': 0,
            'self': 1,
            'other': 2}
dfdata = {
    'user': ['xyz', 'aaa', 'bbb'],
    'anno': [
        [[['past', 'self'], 'alcohol'], [['present', 'self'], 'tobacco']],
        [[['general'], 'marijuana']],
        [[['past', 'other'], 'alcohol'],
         [['future', 'other'], 'marijuana']]
    ],
    'Temp': [0] * 3,
    'Exp': [0] * 3,
    'Tobacco': [0] * 3,
    'MJ': [0] * 3,
    'Alc': [0] * 3
}
df = pd.DataFrame(data=dfdata, index=range(3))
col = 'anno'
newdf = pd.DataFrame(columns=[x for x in df.columns if x != col])
df.set_index('user', inplace=True)
i = 0
for us in df.index:
    annos = df.loc[us, 'anno']
    for rec in annos:
        newdf.loc[i, 'user' ] = us
        newdf.loc[i, col_map[rec[-1]]] = 1
        for elem in rec[0]:
            if elem in col_map:
                newdf.loc[i, 'Temp'] = col_map[elem]
            if elem in col_map2:
                newdf.loc[i, 'Exp'] = col_map2[elem]
        i += 1

newdf.fillna(0, inplace=True)
print(newdf)

无法遍历包含不规则嵌套列表的 pandas 列

Trouble iterating through pandas column containing irregular nested lists

nested-lists

python-3.x

pandas