如何用 pandas 读取 json 字典类型的文件?
How to read a json-dictionary type file with pandas?
我有一个很长的 json 像这样:http://pastebin.com/gzhHEYGy
我想将它放入 pandas datframe 以便使用它,因此根据文档我执行以下操作:
df = pd.read_json('/user/file.json')
print df
我得到了这个回溯:
File "/Users/user/PycharmProjects/PAN-pruebas/json_2_dataframe.py", line 6, in <module>
df = pd.read_json('/Users/user/Downloads/54db3923f033e1dd6a82222aa2604ab9.json')
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
date_unit).parse()
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
self._parse_no_numpy()
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 203, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 327, in _init_dict
dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4620, in _arrays_to_mgr
index = extract_index(arrays)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4668, in extract_index
raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length
然后从上一个问题我发现我需要做这样的事情:
d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
但我不知道我应该如何获取像 numpy 数组这样的内容。如何在这样的大文件中保留数组的长度?提前致谢。
json 方法不起作用,因为 json 文件不是它期望的格式。由于我们可以轻松地将 json 作为字典加载,让我们尝试这种方式:
import pandas as pd
import json
import os
os.chdir('/Users/nicolas/Downloads')
# Reading the json as a dict
with open('json_example.json') as json_data:
data = json.load(json_data)
# using the from_dict load function. Note that the 'orient' parameter
#is not using the default value (or it will give the same error that you got before)
# We transpose the resulting df and set index column as its index to get this result
pd.DataFrame.from_dict(data, orient='index').T.set_index('index')
输出:
data columns
index
311210177061863424 [25-34\n, FEMALE, @bikewa absolutely the best.... age
310912785183813632 [25-34\n, FEMALE, Photo: I love the Burke-Gilm... gender
311290293871849472 [25-34\n, FEMALE, Photo: Inhaled! #fitfoodie h... text
309386414548717569 [25-34\n, FEMALE, Facebook Is Making The Most ... None
312327801187495936 [25-34\n, FEMALE, Still upset about this >&... None
312249421079400449 [25-34\n, FEMALE, @JoeM_PM_UK @JonAntoine I've... None
308692673194246145 [25-34\n, FEMALE, @Social_Freedom_ actually, t... None
308995226633129984 [25-34\n, FEMALE, @seattleweekly that's more t... None
308660851219501056 [25-34\n, FEMALE, @adamholdenbache I noticed 1... None
308658690528014337 [25-34\n, FEMALE, @CEM_Social I am waiting pat... None
309719798001070080 [25-34\n, FEMALE, Going to be watching Faceboo... None
312349448049152002 [25-34\n, FEMALE, @anikamarketer I applied for... None
312325152698404864 [25-34\n, FEMALE, @_chrisrojas_ wow, that's so... None
310546490844135425 [25-34\n, FEMALE, Photo: Feeling like a bit of... None
pandas 模块而不是 json 模块应该是答案:
pandas本身就有read_json的能力,问题的根源一定是你没有按正确的方向阅读json。
您必须首先传递用于创建 json 变量的确切方向参数
例如:
df_json = globals()['df'].to_json(orient='split')
然后:
read_to_json = pd.read_json(df_json, orient='split')
我有一个很长的 json 像这样:http://pastebin.com/gzhHEYGy
我想将它放入 pandas datframe 以便使用它,因此根据文档我执行以下操作:
df = pd.read_json('/user/file.json')
print df
我得到了这个回溯:
File "/Users/user/PycharmProjects/PAN-pruebas/json_2_dataframe.py", line 6, in <module>
df = pd.read_json('/Users/user/Downloads/54db3923f033e1dd6a82222aa2604ab9.json')
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 198, in read_json
date_unit).parse()
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 266, in parse
self._parse_no_numpy()
File "/usr/local/lib/python2.7/site-packages/pandas/io/json.py", line 483, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 203, in __init__
mgr = self._init_dict(data, index, columns, dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 327, in _init_dict
dtype=dtype)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4620, in _arrays_to_mgr
index = extract_index(arrays)
File "/usr/local/lib/python2.7/site-packages/pandas/core/frame.py", line 4668, in extract_index
raise ValueError('arrays must all be same length')
ValueError: arrays must all be same length
然后从上一个问题我发现我需要做这样的事情:
d = dict( A = np.array([1,2]), B = np.array([1,2,3,4]) )
但我不知道我应该如何获取像 numpy 数组这样的内容。如何在这样的大文件中保留数组的长度?提前致谢。
json 方法不起作用,因为 json 文件不是它期望的格式。由于我们可以轻松地将 json 作为字典加载,让我们尝试这种方式:
import pandas as pd
import json
import os
os.chdir('/Users/nicolas/Downloads')
# Reading the json as a dict
with open('json_example.json') as json_data:
data = json.load(json_data)
# using the from_dict load function. Note that the 'orient' parameter
#is not using the default value (or it will give the same error that you got before)
# We transpose the resulting df and set index column as its index to get this result
pd.DataFrame.from_dict(data, orient='index').T.set_index('index')
输出:
data columns
index
311210177061863424 [25-34\n, FEMALE, @bikewa absolutely the best.... age
310912785183813632 [25-34\n, FEMALE, Photo: I love the Burke-Gilm... gender
311290293871849472 [25-34\n, FEMALE, Photo: Inhaled! #fitfoodie h... text
309386414548717569 [25-34\n, FEMALE, Facebook Is Making The Most ... None
312327801187495936 [25-34\n, FEMALE, Still upset about this >&... None
312249421079400449 [25-34\n, FEMALE, @JoeM_PM_UK @JonAntoine I've... None
308692673194246145 [25-34\n, FEMALE, @Social_Freedom_ actually, t... None
308995226633129984 [25-34\n, FEMALE, @seattleweekly that's more t... None
308660851219501056 [25-34\n, FEMALE, @adamholdenbache I noticed 1... None
308658690528014337 [25-34\n, FEMALE, @CEM_Social I am waiting pat... None
309719798001070080 [25-34\n, FEMALE, Going to be watching Faceboo... None
312349448049152002 [25-34\n, FEMALE, @anikamarketer I applied for... None
312325152698404864 [25-34\n, FEMALE, @_chrisrojas_ wow, that's so... None
310546490844135425 [25-34\n, FEMALE, Photo: Feeling like a bit of... None
pandas 模块而不是 json 模块应该是答案: pandas本身就有read_json的能力,问题的根源一定是你没有按正确的方向阅读json。 您必须首先传递用于创建 json 变量的确切方向参数
例如:
df_json = globals()['df'].to_json(orient='split')
然后:
read_to_json = pd.read_json(df_json, orient='split')