读取包含多个字典的 JSON 文件

Question

我创建了一个 JSON 文件，其中包含我流式传输的推文。该文件有多个词典，即每个推文一个。当我尝试读取此文件时

json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 3419)

这个位置是新 record/tweet/dictionary 开始的地方。我该如何解决这个问题？我尝试查找类似的答案，但它们与我的问题无关。我怎样才能阅读这个文件？我是否以错误的方式存储它？

这是 JSON 文件：

{"created_at": "Thu Jul 18 12:06:44 +0000 2019", "id": 1151825627051257856, "id_str": "1151825627051257856", "text": "@godhoonbey @cuttingedge2019 Unparalleled greed for power to loot on display in Karnataka in history of India. Did\u2026 ", "display_text_range": [29, 140], "source": "<a href=\"" rel=\"nofollow\">Twitter for Android</a>", "truncated": true, "in_reply_to_status_id": 1151797702419787778, "in_reply_to_status_id_str": "1151797702419787778", "in_reply_to_user_id": 840249609368797186,
.
.
.
.
"lang": "en", "timestamp_ms": "1563451604031"
}
{
    # another tweet content
}

Answer 1

因此，您的文件不完全是有效的 JSON。

你需要用[和]把它包起来，使它成为一个大列表，并在每个文档后添加逗号（以分隔它们）。

如果（且仅当）每个文档都在其自己的一行中（我认为这是因为错误在 line 2 column 1 上），您可以使用 json.loads 逐行解析它，像这样：

import json


def parse_data(filename):
    for l in open(filename, 'r'):
        yield json.loads(l)


data = list(parse_data(filename))

但是，您真的应该像我最初建议的那样，将其包装在一个大列表中，使其成为有效的 JSON。

读取包含多个字典的 JSON 文件

Reading a JSON file with multiple dictionary

python

json

tweepy

python-3.x