将 pickle 文件 (json) 的结果转换为数据帧
converting results of pickle file (json) to dataframe
我正在从 pickle 文件中读取如下内容:
data=pickle.load(open("name_ethnicities.pkl", "rb"))
它 return 看起来像 json 文件,如下所示:
{'t creavalle': [{'scores': [{'ethnicity': 'Asian', 'score': '0.01'},
{'ethnicity': 'GreaterAfrican', 'score': '0.00'},
{'ethnicity': 'GreaterEuropean', 'score': '0.99'}],
'best': 'GreaterEuropean'},
{'scores': [{'ethnicity': 'British', 'score': '0.99'},
{'ethnicity': 'Jewish', 'score': '0.00'},
{'ethnicity': 'WestEuropean', 'score': '0.00'},
{'ethnicity': 'EastEuropean', 'score': '0.00'}],
'best': 'British'}],
'uyŏng yi': [{'scores': [{'ethnicity': 'Asian', 'score': '1.00'},
{'ethnicity': 'GreaterAfrican', 'score': '0.00'},
{'ethnicity': 'GreaterEuropean', 'score': '0.00'}],
'best': 'Asian'},
{'scores': [{'ethnicity': 'IndianSubContinent', 'score': '0.00'},
{'ethnicity': 'GreaterEastAsian', 'score': '1.00'}],
'best': 'GreaterEastAsian'},
{'scores': [{'ethnicity': 'Japanese', 'score': '0.00'},
{'ethnicity': 'EastAsian', 'score': '1.00'}],
'best': 'EastAsian'}],
'temple orme': [{'scores': [{'ethnicity': 'Asian', 'score': '0.00'},
{'ethnicity': 'GreaterAfrican', 'score': '0.00'},
{'ethnicity': 'GreaterEuropean', 'score': '1.00'}],
'best': 'GreaterEuropean'},
{'scores': [{'ethnicity': 'British', 'score': '1.00'},
{'ethnicity': 'Jewish', 'score': '0.00'},
{'ethnicity': 'WestEuropean', 'score': '0.00'},
{'ethnicity': 'EastEuropean', 'score': '0.00'}],
'best': 'British'}]
}
我正在尝试从中生成一个数据框,其中 return 名称和 "best" 导致每个类别以逗号分隔。因此,对于上述内容,数据框将如下所示:
name ethnicity
t creavalle GreaterEuropean, British
uyong yi Asian, GreaterEastAsian, EastAsian
temple orme GreaterEuropean, British
等等...
试过 pd.read_json
但对我不起作用。有什么解决方法的建议吗?
试试这个:
pd.DataFrame([(k, ", ".join([x["best"] for x in v])) for k, v in data.items()],
columns=["name", "ethnicity"])
解释:
- items 和 k,v 是允许完成某些操作的方法。
- 参见
[(k, v) for k,v in data.items()]
的输出示例
- 你现在可以看到它们是对齐的,但第二列不是你想要的
pd.DataFrame([(k,v) for k, v in d.items()])
- 您想 select 每个种族列的最佳结果,您可以通过
[x["best"] for x in v]
做到这一点以获得结果:
name ethnicity
0 t creavalle GreaterEuropean, British
1 uyŏng yi Asian, GreaterEastAsian, EastAsian
2 temple orme GreaterEuropean, British
我正在从 pickle 文件中读取如下内容:
data=pickle.load(open("name_ethnicities.pkl", "rb"))
它 return 看起来像 json 文件,如下所示:
{'t creavalle': [{'scores': [{'ethnicity': 'Asian', 'score': '0.01'},
{'ethnicity': 'GreaterAfrican', 'score': '0.00'},
{'ethnicity': 'GreaterEuropean', 'score': '0.99'}],
'best': 'GreaterEuropean'},
{'scores': [{'ethnicity': 'British', 'score': '0.99'},
{'ethnicity': 'Jewish', 'score': '0.00'},
{'ethnicity': 'WestEuropean', 'score': '0.00'},
{'ethnicity': 'EastEuropean', 'score': '0.00'}],
'best': 'British'}],
'uyŏng yi': [{'scores': [{'ethnicity': 'Asian', 'score': '1.00'},
{'ethnicity': 'GreaterAfrican', 'score': '0.00'},
{'ethnicity': 'GreaterEuropean', 'score': '0.00'}],
'best': 'Asian'},
{'scores': [{'ethnicity': 'IndianSubContinent', 'score': '0.00'},
{'ethnicity': 'GreaterEastAsian', 'score': '1.00'}],
'best': 'GreaterEastAsian'},
{'scores': [{'ethnicity': 'Japanese', 'score': '0.00'},
{'ethnicity': 'EastAsian', 'score': '1.00'}],
'best': 'EastAsian'}],
'temple orme': [{'scores': [{'ethnicity': 'Asian', 'score': '0.00'},
{'ethnicity': 'GreaterAfrican', 'score': '0.00'},
{'ethnicity': 'GreaterEuropean', 'score': '1.00'}],
'best': 'GreaterEuropean'},
{'scores': [{'ethnicity': 'British', 'score': '1.00'},
{'ethnicity': 'Jewish', 'score': '0.00'},
{'ethnicity': 'WestEuropean', 'score': '0.00'},
{'ethnicity': 'EastEuropean', 'score': '0.00'}],
'best': 'British'}]
}
我正在尝试从中生成一个数据框,其中 return 名称和 "best" 导致每个类别以逗号分隔。因此,对于上述内容,数据框将如下所示:
name ethnicity
t creavalle GreaterEuropean, British
uyong yi Asian, GreaterEastAsian, EastAsian
temple orme GreaterEuropean, British
等等...
试过 pd.read_json
但对我不起作用。有什么解决方法的建议吗?
试试这个:
pd.DataFrame([(k, ", ".join([x["best"] for x in v])) for k, v in data.items()],
columns=["name", "ethnicity"])
解释:
- items 和 k,v 是允许完成某些操作的方法。
- 参见
[(k, v) for k,v in data.items()]
的输出示例
- 你现在可以看到它们是对齐的,但第二列不是你想要的
pd.DataFrame([(k,v) for k, v in d.items()])
- 您想 select 每个种族列的最佳结果,您可以通过
[x["best"] for x in v]
做到这一点以获得结果:
name ethnicity 0 t creavalle GreaterEuropean, British 1 uyŏng yi Asian, GreaterEastAsian, EastAsian 2 temple orme GreaterEuropean, British