将 pickle 文件 (json) 的结果转换为数据帧

Question

我正在从 pickle 文件中读取如下内容：

data=pickle.load(open("name_ethnicities.pkl", "rb"))

它 return 看起来像 json 文件，如下所示：

  {'t creavalle': [{'scores': [{'ethnicity': 'Asian', 'score': '0.01'},
         {'ethnicity': 'GreaterAfrican', 'score': '0.00'},
         {'ethnicity': 'GreaterEuropean', 'score': '0.99'}],
        'best': 'GreaterEuropean'},
       {'scores': [{'ethnicity': 'British', 'score': '0.99'},
         {'ethnicity': 'Jewish', 'score': '0.00'},
         {'ethnicity': 'WestEuropean', 'score': '0.00'},
         {'ethnicity': 'EastEuropean', 'score': '0.00'}],
        'best': 'British'}],
      'uyŏng yi': [{'scores': [{'ethnicity': 'Asian', 'score': '1.00'},
         {'ethnicity': 'GreaterAfrican', 'score': '0.00'},
         {'ethnicity': 'GreaterEuropean', 'score': '0.00'}],
        'best': 'Asian'},
       {'scores': [{'ethnicity': 'IndianSubContinent', 'score': '0.00'},
         {'ethnicity': 'GreaterEastAsian', 'score': '1.00'}],
        'best': 'GreaterEastAsian'},
       {'scores': [{'ethnicity': 'Japanese', 'score': '0.00'},
         {'ethnicity': 'EastAsian', 'score': '1.00'}],
        'best': 'EastAsian'}],
      'temple orme': [{'scores': [{'ethnicity': 'Asian', 'score': '0.00'},
         {'ethnicity': 'GreaterAfrican', 'score': '0.00'},
         {'ethnicity': 'GreaterEuropean', 'score': '1.00'}],
        'best': 'GreaterEuropean'},
       {'scores': [{'ethnicity': 'British', 'score': '1.00'},
         {'ethnicity': 'Jewish', 'score': '0.00'},
         {'ethnicity': 'WestEuropean', 'score': '0.00'},
         {'ethnicity': 'EastEuropean', 'score': '0.00'}],
        'best': 'British'}]
     }

我正在尝试从中生成一个数据框，其中 return 名称和 "best" 导致每个类别以逗号分隔。因此，对于上述内容，数据框将如下所示：

name               ethnicity
t creavalle        GreaterEuropean, British
uyong yi           Asian, GreaterEastAsian, EastAsian
temple orme        GreaterEuropean, British

等等...

试过 pd.read_json 但对我不起作用。有什么解决方法的建议吗？

Answer 1

试试这个：

pd.DataFrame([(k, ", ".join([x["best"] for x in v])) for k, v in data.items()], 
             columns=["name", "ethnicity"])

解释：

items 和 k,v 是允许完成某些操作的方法。
参见 [(k, v) for k,v in data.items()]
你现在可以看到它们是对齐的，但第二列不是你想要的 pd.DataFrame([(k,v) for k, v in d.items()])
您想 select 每个种族列的最佳结果，您可以通过 [x["best"] for x in v] 做到这一点以获得结果：

          name                           ethnicity
0  t creavalle            GreaterEuropean, British
1     uyŏng yi  Asian, GreaterEastAsian, EastAsian
2  temple orme            GreaterEuropean, British

将 pickle 文件 (json) 的结果转换为数据帧

converting results of pickle file (json) to dataframe

python

pickle

python-3.x

pandas