在 .txt 文件中打开 JSON 格式的文件
Opening a JSON format file inside a .txt file
我被分配读取多个 .txt 文件,这些文件实际上是来自 Twitter 的 JSON 文件
但是我在尝试使用 JSON 包加载文件时遇到错误。
with open(files_path+'/tweets.json.2019-01-15.txt') as f:
string=f.read()
data=json.loads(string)
tweet_df=pd.DataFrame(data)
print(tweet_df)
我得到的错误是:
File "C:\ProgramData\Anaconda3\envs\HW1\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 9762)
我尝试打开其他文件,结果还是一样,错误在第二行第一列。
{"created_at":"Mon Jan 14 21:59:12 +0000 2019","id":1084932973353467904,"id_str":"1084932973353467904","text":...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"oren_haz","name":"\u05d0\u05d5\u05e8\u05df \u05d7\u05d6\u05df","id":3185038236,"id_str":"3185038236","indices":[3,12]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503152584"}
{"created_at":"Mon Jan 14 21:59:34 +0000 2019","id":1084933066898968576,"id_str":"1084933066898968576","text":"...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"dudiamsalem","name":"\u05d3\u05d5\u05d3\u05d9 \u05d0\u05de\u05e1\u05dc\u05dd\u2066\ud83c\uddee\ud83c\uddf1\u2069\u2066","id":3221813461,"id_str":"3221813461","indices":[3,15]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503174887"}
感谢您的帮助。
这不是一个 JSON 文件。它是一系列单独的 JSON 文档。而不是使用 string=f.read()
,您需要分别为每一行使用一个循环,例如:
for line in f:
data = json.loads(line)
似乎文本文件每行有一个 JSON 字符串,并且每一行都应该是数据框中的一行。您可以通过
构建 df
with open(files_path+'/tweets.json.2019-01-15.txt') as f:
tweet_df = pd.DataFrame([json.loads(line) for line in f])
print(tweet_df)
我被分配读取多个 .txt 文件,这些文件实际上是来自 Twitter 的 JSON 文件 但是我在尝试使用 JSON 包加载文件时遇到错误。
with open(files_path+'/tweets.json.2019-01-15.txt') as f:
string=f.read()
data=json.loads(string)
tweet_df=pd.DataFrame(data)
print(tweet_df)
我得到的错误是:
File "C:\ProgramData\Anaconda3\envs\HW1\lib\json\decoder.py", line 340, in decode
raise JSONDecodeError("Extra data", s, end)
json.decoder.JSONDecodeError: Extra data: line 2 column 1 (char 9762)
我尝试打开其他文件,结果还是一样,错误在第二行第一列。
{"created_at":"Mon Jan 14 21:59:12 +0000 2019","id":1084932973353467904,"id_str":"1084932973353467904","text":...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"oren_haz","name":"\u05d0\u05d5\u05e8\u05df \u05d7\u05d6\u05df","id":3185038236,"id_str":"3185038236","indices":[3,12]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503152584"}
{"created_at":"Mon Jan 14 21:59:34 +0000 2019","id":1084933066898968576,"id_str":"1084933066898968576","text":"...,"symbols":[]},"favorited":false,"retweeted":false,"possibly_sensitive":false,"filter_level":"low","lang":"iw"},"is_quote_status":false,"quote_count":0,"reply_count":0,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[],"urls":[],"user_mentions":[{"screen_name":"dudiamsalem","name":"\u05d3\u05d5\u05d3\u05d9 \u05d0\u05de\u05e1\u05dc\u05dd\u2066\ud83c\uddee\ud83c\uddf1\u2069\u2066","id":3221813461,"id_str":"3221813461","indices":[3,15]}],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"iw","timestamp_ms":"1547503174887"}
感谢您的帮助。
这不是一个 JSON 文件。它是一系列单独的 JSON 文档。而不是使用 string=f.read()
,您需要分别为每一行使用一个循环,例如:
for line in f:
data = json.loads(line)
似乎文本文件每行有一个 JSON 字符串,并且每一行都应该是数据框中的一行。您可以通过
构建 dfwith open(files_path+'/tweets.json.2019-01-15.txt') as f:
tweet_df = pd.DataFrame([json.loads(line) for line in f])
print(tweet_df)