从 gzip 到 json 到数据框到 csv
From gzip to json to dataframe to csv
我正在尝试从打开的 API:
中获取一些数据
https://data.brreg.no/enhetsregisteret/api/enheter/lastned
但我很难理解不同类型的对象以及转换的顺序。是 strings
到 bytes
,是 BytesIO
还是 StringIO
,是 decode('utf-8)
还是 decode('unicode)
等等?
到目前为止:
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
卡到这里了,下一行代码应该怎么写?
json_str = json.loads(decompressed_file.read().decode('utf-8'))
我的解决方法是,如果我将它写成 json 文件,然后再次读入并转换为 df,那么它就可以工作了:
with io.open('brreg.json', 'wb') as f:
f.write(decompressed_file.read())
with open(f_path, encoding='utf-8') as fin:
d = json.load(fin)
df = json_normalize(d)
with open('brreg_2.csv', 'w', encoding='utf-8', newline='') as fout:
fout.write(df.to_csv())
我找到了很多关于它的帖子,但我仍然很困惑。第一个解释的很好,但我还需要一些勺子喂。
How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
对我来说,使用 decompress
函数而不是 GZipFile
class 来解压文件效果很好,但还不确定为什么...
import urllib.request
import gzip
import io
import json
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.decompress(compressed_file.read())
json_str = json.loads(decompressed_file.decode('utf-8'))
EDIT,事实上,以下对我来说也很好用,这似乎是你的确切代码......
(进一步编辑,事实证明这不是您的确切代码,因为您的最后一行在 with 块之外,这意味着 response
在需要时不再打开 - 请参阅评论线程)
import urllib.request
import gzip
import io
import json
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
json_str = json.loads(decompressed_file.read().decode('utf-8'))
我正在尝试从打开的 API:
中获取一些数据https://data.brreg.no/enhetsregisteret/api/enheter/lastned
但我很难理解不同类型的对象以及转换的顺序。是 strings
到 bytes
,是 BytesIO
还是 StringIO
,是 decode('utf-8)
还是 decode('unicode)
等等?
到目前为止:
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
卡到这里了,下一行代码应该怎么写?
json_str = json.loads(decompressed_file.read().decode('utf-8'))
我的解决方法是,如果我将它写成 json 文件,然后再次读入并转换为 df,那么它就可以工作了:
with io.open('brreg.json', 'wb') as f:
f.write(decompressed_file.read())
with open(f_path, encoding='utf-8') as fin:
d = json.load(fin)
df = json_normalize(d)
with open('brreg_2.csv', 'w', encoding='utf-8', newline='') as fout:
fout.write(df.to_csv())
我找到了很多关于它的帖子,但我仍然很困惑。第一个解释的很好,但我还需要一些勺子喂。
How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
对我来说,使用 decompress
函数而不是 GZipFile
class 来解压文件效果很好,但还不确定为什么...
import urllib.request
import gzip
import io
import json
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.decompress(compressed_file.read())
json_str = json.loads(decompressed_file.decode('utf-8'))
EDIT,事实上,以下对我来说也很好用,这似乎是你的确切代码......
(进一步编辑,事实证明这不是您的确切代码,因为您的最后一行在 with 块之外,这意味着 response
在需要时不再打开 - 请参阅评论线程)
import urllib.request
import gzip
import io
import json
url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'
with urllib.request.urlopen(url_get) as response:
encoding = response.info().get_param('charset', 'utf8')
compressed_file = io.BytesIO(response.read())
decompressed_file = gzip.GzipFile(fileobj=compressed_file)
json_str = json.loads(decompressed_file.read().decode('utf-8'))