tweepy api.user_timeline:计数限制为 200
tweepy api.user_timeline: count limited to 200
似乎使用 tweepy 我只能使用 user_timeline 方法获得 200 条推文。
class Twitter_User():
def __init__(self,id,count=200):
self.id = id
self.count = count
self.data = None
def get_tweets(self):
store_tweets = api.user_timeline(self.id, count=self.count)
simple_list = []
for status in store_tweets:
array = [status._json["text"].strip(), status._json["favorite_count"], status._json["created_at"],status._json["retweet_count"],[h["text"] for h in status._json["entities"]["hashtags"]]]
simple_list.append(array)
self.data = pd.DataFrame(simple_list, columns=["Text", "Like", "Created at","Retweet","Hashtags"])
self.data = self.data[~self.data["Text"].str.startswith('RT')]
return self.data
def __repr__(self):
id = api.get_user(self.id)
return id.screen_name
如果我输入 self.count 一个大于 200 的数字,我总是会得到一个包含 200 行的数据框,相反,如果我输入一个较小的数字,我会得到正确的行数。我不知道,有限制还是我必须使用其他方法?
根据 Twitter API docs 您可以从 /statuses/user_timeline/
检索的最多记录是 200
来自count参数的定义:
Specifies the number of Tweets to try and retrieve, up to a maximum of 200 per distinct request. The value of count is best thought of as a limit to the number of Tweets to return because suspended or deleted content is removed after the count has been applied. We include retweets in the count, even if include_rts is not supplied. It is recommended you always send include_rts=1 when using this API method.
并且来自 api.py 行 114 中的 tweepy source code:
@property
def user_timeline(self):
""" :reference: https://dev.twitter.com/rest/reference/get/statuses/user_timeline
:allowed_param:'id', 'user_id', 'screen_name', 'since_id', 'max_id', 'count', 'include_rts'
"""
return bind_api(
api=self,
path='/statuses/user_timeline.json',
payload_type='status', payload_list=True,
allowed_param=['id', 'user_id', 'screen_name', 'since_id',
'max_id', 'count', 'include_rts']
)
一次请求最多只能获取 200 条推文。但是,您可以连续请求旧推文。一条时间线最多可获取推文数为3200条。参考here。
您可以使用 tweepy 执行此操作,但您需要使用 tweepy 的 Cursor 获取这些连续的推文页面。查看 以帮助您入门。
要获得超过 200 个,您需要在 user_timeline
上使用 cursor
,然后遍历页面。
import tweepy
# Consumer keys and access tokens, used for OAuth
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Creation of the actual interface, using authentication
api = tweepy.API(auth)
for pages in tweepy.Cursor(api.user_timeline, id='id', count=200).pages():
print(pages)
使用 tweepy 游标,
#MuniLima 是推特账号,
#最初为空的列表,它们以 For 循环开始。存储高音扬声器值:'create_at'、'favourite_count'、'text'
tweeteo=[]
likes=[]
time = []
for tuit in tweepy.Cursor(api.user_timeline,screen_name='MuniLima').items(2870):
time.append(tuit.created_at)
likes.append(tuit.favorite_count)
tweeteo.append(tuit.text)
似乎使用 tweepy 我只能使用 user_timeline 方法获得 200 条推文。
class Twitter_User():
def __init__(self,id,count=200):
self.id = id
self.count = count
self.data = None
def get_tweets(self):
store_tweets = api.user_timeline(self.id, count=self.count)
simple_list = []
for status in store_tweets:
array = [status._json["text"].strip(), status._json["favorite_count"], status._json["created_at"],status._json["retweet_count"],[h["text"] for h in status._json["entities"]["hashtags"]]]
simple_list.append(array)
self.data = pd.DataFrame(simple_list, columns=["Text", "Like", "Created at","Retweet","Hashtags"])
self.data = self.data[~self.data["Text"].str.startswith('RT')]
return self.data
def __repr__(self):
id = api.get_user(self.id)
return id.screen_name
如果我输入 self.count 一个大于 200 的数字,我总是会得到一个包含 200 行的数据框,相反,如果我输入一个较小的数字,我会得到正确的行数。我不知道,有限制还是我必须使用其他方法?
根据 Twitter API docs 您可以从 /statuses/user_timeline/
检索的最多记录是 200
来自count参数的定义:
Specifies the number of Tweets to try and retrieve, up to a maximum of 200 per distinct request. The value of count is best thought of as a limit to the number of Tweets to return because suspended or deleted content is removed after the count has been applied. We include retweets in the count, even if include_rts is not supplied. It is recommended you always send include_rts=1 when using this API method.
并且来自 api.py 行 114 中的 tweepy source code:
@property
def user_timeline(self):
""" :reference: https://dev.twitter.com/rest/reference/get/statuses/user_timeline
:allowed_param:'id', 'user_id', 'screen_name', 'since_id', 'max_id', 'count', 'include_rts'
"""
return bind_api(
api=self,
path='/statuses/user_timeline.json',
payload_type='status', payload_list=True,
allowed_param=['id', 'user_id', 'screen_name', 'since_id',
'max_id', 'count', 'include_rts']
)
一次请求最多只能获取 200 条推文。但是,您可以连续请求旧推文。一条时间线最多可获取推文数为3200条。参考here。
您可以使用 tweepy 执行此操作,但您需要使用 tweepy 的 Cursor 获取这些连续的推文页面。查看
要获得超过 200 个,您需要在 user_timeline
上使用 cursor
,然后遍历页面。
import tweepy
# Consumer keys and access tokens, used for OAuth
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''
# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Creation of the actual interface, using authentication
api = tweepy.API(auth)
for pages in tweepy.Cursor(api.user_timeline, id='id', count=200).pages():
print(pages)
使用 tweepy 游标, #MuniLima 是推特账号, #最初为空的列表,它们以 For 循环开始。存储高音扬声器值:'create_at'、'favourite_count'、'text'
tweeteo=[]
likes=[]
time = []
for tuit in tweepy.Cursor(api.user_timeline,screen_name='MuniLima').items(2870):
time.append(tuit.created_at)
likes.append(tuit.favorite_count)
tweeteo.append(tuit.text)