从 gzip 到 json 到数据框到 csv

From gzip to json to dataframe to csv

我正在尝试从打开的 API:

中获取一些数据
https://data.brreg.no/enhetsregisteret/api/enheter/lastned 

但我很难理解不同类型的对象以及转换的顺序。是 stringsbytes,是 BytesIO 还是 StringIO,是 decode('utf-8) 还是 decode('unicode) 等等?

到目前为止:

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.GzipFile(fileobj=compressed_file)

卡到这里了,下一行代码应该怎么写?

json_str = json.loads(decompressed_file.read().decode('utf-8'))

我的解决方法是,如果我将它写成 json 文件,然后再次读入并转换为 df,那么它就可以工作了:

with io.open('brreg.json', 'wb') as f:
    f.write(decompressed_file.read())

with open(f_path, encoding='utf-8') as fin:
    d = json.load(fin)

df = json_normalize(d)

with open('brreg_2.csv', 'w', encoding='utf-8', newline='') as fout:
    fout.write(df.to_csv())

我找到了很多关于它的帖子,但我仍然很困惑。第一个解释的很好,但我还需要一些勺子喂。

How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

对我来说,使用 decompress 函数而不是 GZipFile class 来解压文件效果很好,但还不确定为什么...

import urllib.request
import gzip
import io
import json

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.decompress(compressed_file.read())
    json_str = json.loads(decompressed_file.decode('utf-8'))

EDIT,事实上,以下对我来说也很好用,这似乎是你的确切代码...... (进一步编辑,事实证明这不是您的确切代码,因为您的最后一行在 with 块之外,这意味着 response 在需要时不再打开 - 请参阅评论线程)

import urllib.request
import gzip
import io
import json

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.GzipFile(fileobj=compressed_file)
    json_str = json.loads(decompressed_file.read().decode('utf-8'))