JSON 解码器在大约 1000 次迭代后停止

Question

我写了一个小程序，迭代了 1500 个 JSON 文件，解析它们并将记录插入 MySQL 数据库。但在大约 1000 次迭代后，我的 python 脚本停止并出现以下错误：

Traceback (most recent call last):
  File "/home/ubuntu/myprogram/main.py", line 107, in <module>
    exec(open("/home/ubuntu/myprogram/subprogramthatscalled.py").read())
  File "<string>", line 32, in <module>
  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

我仔细检查了所有 JSON 个文件，没有发现任何错误。

这是我使用的代码：

path = "/home/ubuntu/mypath"
path_list = os.listdir(path)

for file in path_list:
        if file.startswith("stuff") and file.endswith(".json"):
                each_file = path + file
                json_file = open(each_file)
                json_eingelesen = json.load(json_file)
                json_objects = len(json_eingelesen['data']['children'])
                for k in range(json_objects):
                        keys = ""
                        values = ""
                        for (k, v) in json_eingelesen['data']['children'][k]['data'].items():
                                keys += str(k) + ", "
                                #here was some little code to prepare kinds of variables
                                values += "'" + str(v) + "', "
                        keys = keys[:-2]
                        values = values[:-2]
                        query = "INSERT INTO reddit_subreddit_posts (%s) VALUES (%s)"%(keys, 
                        values) #I know that SQLinjection is easy here but my raspberry is 
                                 not available from outside my network, Ill secure that later
                        queries.execute(query)
                        mydb.commit()
                os.remove(each_file)

现在是有趣的部分： 当我重新启动脚本时，它运行时没有错误。

所以我的问题是： 是否有任何限制阻止 python JSON 解码器或我的代码的任何其他部分迭代 JSON 文件的其余部分？

这是十六进制的初始字节： 7b 22 6b 69 6e 64 22 3a

Answer 1

您需要在 code-line json.load(json_file) 中添加 try ... except 机制以捕获这些错误。

可能 json_file 变量中的数据不是您的某个文件中的 JSON 字符串。

Answer 2

您可能不会在文件中看到错误，因为它是不可见的。

字符 0 处的错误，很可能是 BOM 字符。

Byte-Order标记，一些文本编辑器在保存UTF8文件时插入的标记。

您可以编辑文件以删除 BOM，或在打开文件时将编码设置为 'utf-8-sig'。

json_file = open(each_file, encoding='utf-8-sig')

JSON 解码器在大约 1000 次迭代后停止

JSON decoder stops after roughly 1000 iterations

python

mysql

json