在 .txt 文件中打开 JSON 格式的文件

Opening a JSON format file inside a .txt file

我被分配读取多个 .txt 文件,这些文件实际上是来自 Twitter 的 JSON 文件 但是我在尝试使用 JSON 包加载文件时遇到错误。

    with open(files_path+'/tweets.json.2019-01-15.txt') as f:
    string=f.read()
    data=json.loads(string)
tweet_df=pd.DataFrame(data)
print(tweet_df)

我得到的错误是:

 File "C:\ProgramData\Anaconda3\envs\HW1\lib\json\decoder.py", line 340, in decode
    raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 9762)

我尝试打开其他文件,结果还是一样,错误在第二行第一列。

{"created_at":"Mon Jan 14 21:59:12 +0000 2019","id":1084932973353467904,"id_str":"1084932973353467904","text":...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"oren_haz","name":"\u05d0\u05d5\u05e8\u05df \u05d7\u05d6\u05df","id":3185038236,"id_str":"3185038236","indices":[3,12]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503152584"}
{"created_at":"Mon Jan 14 21:59:34 +0000 2019","id":1084933066898968576,"id_str":"1084933066898968576","text":"...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"dudiamsalem","name":"\u05d3\u05d5\u05d3\u05d9 \u05d0\u05de\u05e1\u05dc\u05dd\u2066\ud83c\uddee\ud83c\uddf1\u2069\u2066","id":3221813461,"id_str":"3221813461","indices":[3,15]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503174887"}

感谢您的帮助。

这不是一个 JSON 文件。它是一系列单独的 JSON 文档。而不是使用 string=f.read(),您需要分别为每一行使用一个循环,例如:

    for line in f:
        data = json.loads(line)

似乎文本文件每行有一个 JSON 字符串,并且每一行都应该是数据框中的一行。您可以通过

构建 df
with open(files_path+'/tweets.json.2019-01-15.txt') as f:
    tweet_df = pd.DataFrame([json.loads(line) for line in f])
print(tweet_df)