将 pickle 文件 (json) 的结果转换为数据帧

converting results of pickle file (json) to dataframe

我正在从 pickle 文件中读取如下内容:

data=pickle.load(open("name_ethnicities.pkl", "rb"))

它 return 看起来像 json 文件,如下所示:

  {'t creavalle': [{'scores': [{'ethnicity': 'Asian', 'score': '0.01'},
         {'ethnicity': 'GreaterAfrican', 'score': '0.00'},
         {'ethnicity': 'GreaterEuropean', 'score': '0.99'}],
        'best': 'GreaterEuropean'},
       {'scores': [{'ethnicity': 'British', 'score': '0.99'},
         {'ethnicity': 'Jewish', 'score': '0.00'},
         {'ethnicity': 'WestEuropean', 'score': '0.00'},
         {'ethnicity': 'EastEuropean', 'score': '0.00'}],
        'best': 'British'}],
      'uyŏng yi': [{'scores': [{'ethnicity': 'Asian', 'score': '1.00'},
         {'ethnicity': 'GreaterAfrican', 'score': '0.00'},
         {'ethnicity': 'GreaterEuropean', 'score': '0.00'}],
        'best': 'Asian'},
       {'scores': [{'ethnicity': 'IndianSubContinent', 'score': '0.00'},
         {'ethnicity': 'GreaterEastAsian', 'score': '1.00'}],
        'best': 'GreaterEastAsian'},
       {'scores': [{'ethnicity': 'Japanese', 'score': '0.00'},
         {'ethnicity': 'EastAsian', 'score': '1.00'}],
        'best': 'EastAsian'}],
      'temple orme': [{'scores': [{'ethnicity': 'Asian', 'score': '0.00'},
         {'ethnicity': 'GreaterAfrican', 'score': '0.00'},
         {'ethnicity': 'GreaterEuropean', 'score': '1.00'}],
        'best': 'GreaterEuropean'},
       {'scores': [{'ethnicity': 'British', 'score': '1.00'},
         {'ethnicity': 'Jewish', 'score': '0.00'},
         {'ethnicity': 'WestEuropean', 'score': '0.00'},
         {'ethnicity': 'EastEuropean', 'score': '0.00'}],
        'best': 'British'}]
     }

我正在尝试从中生成一个数据框,其中 return 名称和 "best" 导致每个类别以逗号分隔。因此,对于上述内容,数据框将如下所示:

name               ethnicity
t creavalle        GreaterEuropean, British
uyong yi           Asian, GreaterEastAsian, EastAsian
temple orme        GreaterEuropean, British

等等...

试过 pd.read_json 但对我不起作用。有什么解决方法的建议吗?

试试这个:

pd.DataFrame([(k, ", ".join([x["best"] for x in v])) for k, v in data.items()], 
             columns=["name", "ethnicity"])

解释:

  • items 和 k,v 是允许完成某些操作的方法。
  • 参见 [(k, v) for k,v in data.items()]
  • 的输出示例
  • 你现在可以看到它们是对齐的,但第二列不是你想要的 pd.DataFrame([(k,v) for k, v in d.items()])
  • 您想 select 每个种族列的最佳结果,您可以通过 [x["best"] for x in v] 做到这一点以获得结果:
          name                           ethnicity
0  t creavalle            GreaterEuropean, British
1     uyŏng yi  Asian, GreaterEastAsian, EastAsian
2  temple orme            GreaterEuropean, British