如何读取保存在 txt 文件中的推特数据?
How do I read twitter data saved in a txt file?
我使用推特流 API 在 python 中收集了大量推文。
所有推文都转储到一个 txt 文件中。
{"created_at":"Sun Jul 03 15:23:11 +0000 2016","id":749624538015621120,"id_str":"749624538015621120","text":"Et hop un petit muldo dor\u00e9-indigo #enroutepourlaG2","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3050686557,"id_str":"3050686557","name":"Heresia","screen_name":"Air_Et_Zia","location":null,"url":null,"description":"Joueur de Dofus depuis 6 ans. Essentiellement ax\u00e9 PvP. Actuellement sur #Amayiro !","protected":false,"verified":false,"followers_count":296,"friends_count":30,"listed_count":0,"favourites_count":23,"statuses_count":216,"created_at":"Sat Feb 21 20:45:02 +0000 2015","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"9266CC","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/569237837581545472\/e_OJaGOl_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/569237837581545472\/e_OJaGOl_normal.png","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"enroutepourlaG2","indices":[34,50]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"fr","timestamp_ms":"1467559391870"}
- 如何阅读数百条推文中的个别推文文本?
- 我可以将带有特定标签的推文保存到另一个文本文件中吗?例如,包含单词 "Indigo" 的推文将存储在另一个文本文件中。
我能想到的唯一解决方案是使用正则表达式。 python有没有更好的解决方案?
由于您拥有的是有效的 JSON,您可以使用 JSON 解析器:
import json
json_string = r'''{"created_at":"Sun Jul 03 15:23:11 +0000 2016","id":749624538015621120,"id_str":"749624538015621120","text":"Et hop un petit muldo dor\u00e9-indigo #enroutepourlaG2","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3050686557,"id_str":"3050686557","name":"Heresia","screen_name":"Air_Et_Zia","location":null,"url":null,"description":"Joueur de Dofus depuis 6 ans. Essentiellement ax\u00e9 PvP. Actuellement sur #Amayiro !","protected":false,"verified":false,"followers_count":296,"friends_count":30,"listed_count":0,"favourites_count":23,"statuses_count":216,"created_at":"Sat Feb 21 20:45:02 +0000 2015","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"9266CC","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/569237837581545472\/e_OJaGOl_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/569237837581545472\/e_OJaGOl_normal.png","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"enroutepourlaG2","indices":[34,50]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"fr","timestamp_ms":"1467559391870"}'''
twit = json.loads(json_string)
print (json.dumps(twit["text"]))#or any string manipulation here
输出:
"Et hop un petit muldo dor\u00e9-indigo #enroutepourlaG2"
=> None
而且,请记住这条建议:
正则表达式用于字符串匹配。永远不要用它来解析 (x)HTML、JSON、XML、CSV 或任何可以解析的格式。请改用解析器。
您将节省 很多 时间。
我使用推特流 API 在 python 中收集了大量推文。
所有推文都转储到一个 txt 文件中。
{"created_at":"Sun Jul 03 15:23:11 +0000 2016","id":749624538015621120,"id_str":"749624538015621120","text":"Et hop un petit muldo dor\u00e9-indigo #enroutepourlaG2","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3050686557,"id_str":"3050686557","name":"Heresia","screen_name":"Air_Et_Zia","location":null,"url":null,"description":"Joueur de Dofus depuis 6 ans. Essentiellement ax\u00e9 PvP. Actuellement sur #Amayiro !","protected":false,"verified":false,"followers_count":296,"friends_count":30,"listed_count":0,"favourites_count":23,"statuses_count":216,"created_at":"Sat Feb 21 20:45:02 +0000 2015","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"9266CC","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/569237837581545472\/e_OJaGOl_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/569237837581545472\/e_OJaGOl_normal.png","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"enroutepourlaG2","indices":[34,50]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"fr","timestamp_ms":"1467559391870"}
- 如何阅读数百条推文中的个别推文文本?
- 我可以将带有特定标签的推文保存到另一个文本文件中吗?例如,包含单词 "Indigo" 的推文将存储在另一个文本文件中。
我能想到的唯一解决方案是使用正则表达式。 python有没有更好的解决方案?
由于您拥有的是有效的 JSON,您可以使用 JSON 解析器:
import json
json_string = r'''{"created_at":"Sun Jul 03 15:23:11 +0000 2016","id":749624538015621120,"id_str":"749624538015621120","text":"Et hop un petit muldo dor\u00e9-indigo #enroutepourlaG2","source":"\u003ca href=\"http:\/\/twitter.com\" rel=\"nofollow\"\u003eTwitter Web Client\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":3050686557,"id_str":"3050686557","name":"Heresia","screen_name":"Air_Et_Zia","location":null,"url":null,"description":"Joueur de Dofus depuis 6 ans. Essentiellement ax\u00e9 PvP. Actuellement sur #Amayiro !","protected":false,"verified":false,"followers_count":296,"friends_count":30,"listed_count":0,"favourites_count":23,"statuses_count":216,"created_at":"Sat Feb 21 20:45:02 +0000 2015","utc_offset":null,"time_zone":null,"geo_enabled":false,"lang":"fr","contributors_enabled":false,"is_translator":false,"profile_background_color":"000000","profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png","profile_background_tile":false,"profile_link_color":"9266CC","profile_sidebar_border_color":"000000","profile_sidebar_fill_color":"000000","profile_text_color":"000000","profile_use_background_image":false,"profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/569237837581545472\/e_OJaGOl_normal.png","profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/569237837581545472\/e_OJaGOl_normal.png","default_profile":false,"default_profile_image":false,"following":null,"follow_request_sent":null,"notifications":null},"geo":null,"coordinates":null,"place":null,"contributors":null,"is_quote_status":false,"retweet_count":0,"favorite_count":0,"entities":{"hashtags":[{"text":"enroutepourlaG2","indices":[34,50]}],"urls":[],"user_mentions":[],"symbols":[]},"favorited":false,"retweeted":false,"filter_level":"low","lang":"fr","timestamp_ms":"1467559391870"}'''
twit = json.loads(json_string)
print (json.dumps(twit["text"]))#or any string manipulation here
输出:
"Et hop un petit muldo dor\u00e9-indigo #enroutepourlaG2" => None
而且,请记住这条建议:
正则表达式用于字符串匹配。永远不要用它来解析 (x)HTML、JSON、XML、CSV 或任何可以解析的格式。请改用解析器。
您将节省 很多 时间。