Python - 如何提取列表列标签和值并根据唯一 ID 进行转置

Python - How Can I Extract the List Column Label & Value & Transpose In Respect to Unique ID

下面是我正在使用的数据框:

Row  |ID   | List
----------------------------------------------------------------------------------------------------------------------------------------------------------------
1    |45   | [{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}]
2    |76   | [{u'value': u'1', u'label': u'Forum Thread Size'}, {u'value': u'1', u'label': u'Unique Commenters'}, {u'value': u'1', u'label': u'Engagement'}, {u'value': u'0', u'label': u'Likes and Votes'}]
3    |99   | []
4    |83   | [{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}]
5    |80   | []

我希望数据在转换后看起来像这样,在 pandas 数据框中:

Row |ID |Forum Thread Size |Unique Commenters |Engagement |Likes and Votes
------------------------------------------------------------------------------------------------------------------------------------------------------
1 |45 |0                 |0          |               |0
2 |76 |1                 |1                 |1         |0
3 |99 |   |                       |               |
4 |83 |0                 |0          |               |0
5 |80 |   |                       |               |

您可以使用apply循环遍历List列,并将每个列表转换为pandas.Series object,以label为索引;这将产生一个数据框,其中 label 作为列 headers,然后您可以 concat 使用数据框的其余列来获得您需要的内容:

df1 = pd.concat([
    df.drop('List', 1), 
    df.List.apply(lambda lst: pd.Series({
       d['label']: d['value'] for d in lst
    }))
], axis=1)
​
df1
# Row   ID  Engagement   Forum Thread Size   Likes and Votes    Unique Commenters
#0  1   45        NaN                    0                 0                    0
#1  2   76          1                    1                 0                    1
#2  3   99        NaN                  NaN               NaN                  NaN
#3  4   83        NaN                    0                 0                    0
#4  5   80        NaN                  NaN               NaN                  NaN

IIUC

df1=df.set_index(['Row','ID']).List.apply(pd.Series).stack().apply(pd.Series).reset_index()
df1.pivot_table(index=['Row','ID'], columns='label', values='value',aggfunc=np.sum).merge(df[['Row','ID']],left_index=True,right_on=['Row','ID'],how='right')

Out[334]: 
  Engagement Forum Thread Size Likes and Votes Unique Commenters  Row  ID
0       None                 0               0                 0    1   1
1          1                 1               0                 1    2   2
2        NaN               NaN             NaN               NaN    3   3

数据输入:

df = pd.DataFrame({'Row':[1,2,3],'ID':[1,2,3], 'List':[[{u'value': u'0', u'label': u'Forum Thread Size'}, {u'value': u'0', u'label': u'Unique Commenters'}, {u'value': u'0', u'label': u'Likes and Votes'}], [{u'value': u'1', u'label': u'Forum Thread Size'}, {u'value': u'1', u'label': u'Unique Commenters'}, {u'value': u'1', u'label': u'Engagement'}, {u'value': u'0', u'label': u'Likes and Votes'}],[]]})