Tweepy Streaming 过滤器字段
Tweepy Streaming filter fields
我有这个 python 代码,它使用 Tweepy 和 Streming API 从 Twitter 检索数据,并在找到 1000 个结果(即 1000 个推文数据)时停止。
它运行良好,但问题是当我尝试在 PyCharm 上 运行 它时,它会削减部分结果。由于代码 returns 推文的所有数据(ID、文本、作者 ecc)可能会生成太多数据并且软件会崩溃。所以我想修改代码以仅获取推特数据的某些字段(例如,我只需要推文的文本、作者、日期)
任何建议表示赞赏
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the necessary methods from "twitter" library
from twitter import Twitter, OAuth, TwitterHTTPError, TwitterStream
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = ''
ACCESS_SECRET = ''
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
# Initiate the connection to Twitter Streaming API
twitter_stream = TwitterStream(auth=oauth)
# Get a sample of the public data following through Twitter
#iterator = twitter_stream.statuses.sample() #SEMPLICE TWITTER STREAMING
iterator = twitter_stream.statuses.filter(track="Euro2016", language="en") #tWITTER STREAMING IN BASE AD UNA TRACK DI RICERCA E AL LINGUAGGIO PER ALTRI SETTAGGI VEDERE https://dev.twitter.com/streaming/overview/request-parameters
#PER SETTARE PARAMETRI RICERCA https://dev.twitter.com/streaming/overview/request-parameters
# Print each tweet in the stream to the screen
# Here we set it to stop after getting 1000 tweets.
# You don't have to set it to stop, but can continue running
# the Twitter API to collect data for days or even longer.
tweet_count = 1000 #SETTAGGIO DI QUANTI RISULTATI RESTITUIRE
for tweet in iterator:
tweet_count -= 1
# Twitter Python Tool wraps the data returned by Twitter
# as a TwitterDictResponse object.
# We convert it back to the JSON format to print/score
print(json.dumps(tweet))
# The command below will do pretty printing for JSON data, try it out
# print json.dumps(tweet, indent=4)
if tweet_count <= 0:
break
我能够 运行 在 PyCharm 上完成 1000 条推文而没有任何问题。因此,请在另一台计算机上尝试 运行ning 或调查现有系统是否存在问题。
结果是一个 python 字典,因此您需要访问各个元素,如下所示
for tweet in iterator:
tweet_count -= 1
#access the elements such as 'text','created_at' ...
print tweet['text']
我有这个 python 代码,它使用 Tweepy 和 Streming API 从 Twitter 检索数据,并在找到 1000 个结果(即 1000 个推文数据)时停止。 它运行良好,但问题是当我尝试在 PyCharm 上 运行 它时,它会削减部分结果。由于代码 returns 推文的所有数据(ID、文本、作者 ecc)可能会生成太多数据并且软件会崩溃。所以我想修改代码以仅获取推特数据的某些字段(例如,我只需要推文的文本、作者、日期) 任何建议表示赞赏
# Import the necessary package to process data in JSON format
try:
import json
except ImportError:
import simplejson as json
# Import the necessary methods from "twitter" library
from twitter import Twitter, OAuth, TwitterHTTPError, TwitterStream
# Variables that contains the user credentials to access Twitter API
ACCESS_TOKEN = ''
ACCESS_SECRET = ''
CONSUMER_KEY = ''
CONSUMER_SECRET = ''
oauth = OAuth(ACCESS_TOKEN, ACCESS_SECRET, CONSUMER_KEY, CONSUMER_SECRET)
# Initiate the connection to Twitter Streaming API
twitter_stream = TwitterStream(auth=oauth)
# Get a sample of the public data following through Twitter
#iterator = twitter_stream.statuses.sample() #SEMPLICE TWITTER STREAMING
iterator = twitter_stream.statuses.filter(track="Euro2016", language="en") #tWITTER STREAMING IN BASE AD UNA TRACK DI RICERCA E AL LINGUAGGIO PER ALTRI SETTAGGI VEDERE https://dev.twitter.com/streaming/overview/request-parameters
#PER SETTARE PARAMETRI RICERCA https://dev.twitter.com/streaming/overview/request-parameters
# Print each tweet in the stream to the screen
# Here we set it to stop after getting 1000 tweets.
# You don't have to set it to stop, but can continue running
# the Twitter API to collect data for days or even longer.
tweet_count = 1000 #SETTAGGIO DI QUANTI RISULTATI RESTITUIRE
for tweet in iterator:
tweet_count -= 1
# Twitter Python Tool wraps the data returned by Twitter
# as a TwitterDictResponse object.
# We convert it back to the JSON format to print/score
print(json.dumps(tweet))
# The command below will do pretty printing for JSON data, try it out
# print json.dumps(tweet, indent=4)
if tweet_count <= 0:
break
我能够 运行 在 PyCharm 上完成 1000 条推文而没有任何问题。因此,请在另一台计算机上尝试 运行ning 或调查现有系统是否存在问题。
结果是一个 python 字典,因此您需要访问各个元素,如下所示
for tweet in iterator:
tweet_count -= 1
#access the elements such as 'text','created_at' ...
print tweet['text']