将以元组作为键的嵌套字典转换为数据框

Convert a nested dictionary with tuples as keys to a dataframe

所以我有以下字典:

user_dict = {'user1': {'id1': {('word1', 'word2'): 0.99, ('word3', 'word4'): 0.16},
                       'id2': {('word5', 'word6'): 0.73, ('word7', 'word8'): 0.69}},
             'user2': {'id3': {('word9', 'word10'): 0.59, ('word11', 'word12'): 0.13},
                       'id4': {('word13', 'word14'): 0.41, ('word14', 'word15'): 0.74}}}

出于我的目的,我想将嵌套字典转换为 pandas 形式的数据框:

  user  |  id  |  w1   |  w2   | score
---------------------------------------
  user1 |  id1 | word1 | word2 | 0.99
        |      | word3 | word4 | 0.16
        |  id2 | word5 | word6 | 0.73   and so on.

我之前尝试过几种方法,这是我目前的解决方案:

df = pd.Series({(i,j): user_dict[i][j]
                      for i in user_dict.keys()
                      for j in user_dict[i].keys()}).rename_axis(['user', 'id']).reset_index(name='Col3')

所以输出是:

 user  |  id  |                        Col3
 -------------------------------------------------------------------
 user1 |  id1 | {('word1', 'word2'): 0.99, ('word3', 'word4'): 0.16)}
 user1 |  id2 | {('word5', 'word6'): 0.73, ('word7', 'word8'): 0.69)}    and so on.

谁能告诉我我在最后一列中做错了什么?

您可以使用嵌套列表 comprehension/generator:

df = pd.DataFrame(([k0, k1, *k2, d2]
                   for k0, d0  in user_dict.items()
                   for k1, d1 in d0.items()
                   for k2, d2 in d1.items()
                   ), columns=['user', 'id', 'w1', 'w2', 'score'])

输出:

    user   id      w1      w2  score
0  user1  id1   word1   word2   0.99
1  user1  id1   word3   word4   0.16
2  user1  id2   word5   word6   0.73
3  user1  id2   word7   word8   0.69
4  user2  id3   word9  word10   0.59
5  user2  id3  word11  word12   0.13
6  user2  id4  word13  word14   0.41
7  user2  id4  word14  word15   0.74

或者,循环更少:

>>> pd.concat({k: pd.DataFrame(v) for k, v in user_dict.items()}).melt(ignore_index=False).dropna()

                    variable  value
user1 word1  word2       id1   0.99
      word3  word4       id1   0.16
      word5  word6       id2   0.73
      word7  word8       id2   0.69
user2 word9  word10      id3   0.59
      word11 word12      id3   0.13
      word13 word14      id4   0.41
      word14 word15      id4   0.74