tweepy api.user_timeline:计数限制为 200

tweepy api.user_timeline: count limited to 200

似乎使用 tweepy 我只能使用 user_timeline 方法获得 200 条推文。

class Twitter_User():
    def __init__(self,id,count=200):
        self.id = id
        self.count = count
        self.data = None
    def get_tweets(self):
        store_tweets = api.user_timeline(self.id, count=self.count)
        simple_list = []
        for status in store_tweets:
            array = [status._json["text"].strip(), status._json["favorite_count"], status._json["created_at"],status._json["retweet_count"],[h["text"] for h in status._json["entities"]["hashtags"]]]
            simple_list.append(array)
        self.data = pd.DataFrame(simple_list, columns=["Text", "Like", "Created at","Retweet","Hashtags"])
        self.data = self.data[~self.data["Text"].str.startswith('RT')]
        return self.data
    def __repr__(self):
        id = api.get_user(self.id)
        return id.screen_name

如果我输入 self.count 一个大于 200 的数字,我总是会得到一个包含 200 行的数据框,相反,如果我输入一个较小的数字,我会得到正确的行数。我不知道,有限制还是我必须使用其他方法?

根据 Twitter API docs 您可以从 /statuses/user_timeline/ 检索的最多记录是 200

来自count参数的定义:

Specifies the number of Tweets to try and retrieve, up to a maximum of 200 per distinct request. The value of count is best thought of as a limit to the number of Tweets to return because suspended or deleted content is removed after the count has been applied. We include retweets in the count, even if include_rts is not supplied. It is recommended you always send include_rts=1 when using this API method.

并且来自 api.py 行 114 中的 tweepy source code

@property
def user_timeline(self):
    """ :reference: https://dev.twitter.com/rest/reference/get/statuses/user_timeline
        :allowed_param:'id', 'user_id', 'screen_name', 'since_id', 'max_id', 'count', 'include_rts'
    """
    return bind_api(
        api=self,
        path='/statuses/user_timeline.json',
        payload_type='status', payload_list=True,
        allowed_param=['id', 'user_id', 'screen_name', 'since_id',
                       'max_id', 'count', 'include_rts']
    )

一次请求最多只能获取 200 条推文。但是,您可以连续请求旧推文。一条时间线最多可获取推文数为3200条。参考here

您可以使用 tweepy 执行此操作,但您需要使用 tweepy 的 Cursor 获取这些连续的推文页面。查看 以帮助您入门。

要获得超过 200 个,您需要在 user_timeline 上使用 cursor,然后遍历页面。

import tweepy

# Consumer keys and access tokens, used for OAuth
consumer_key = ''
consumer_secret = ''
access_token = ''
access_token_secret = ''

# OAuth process, using the keys and tokens
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

# Creation of the actual interface, using authentication
api = tweepy.API(auth)

for pages in tweepy.Cursor(api.user_timeline, id='id', count=200).pages():        
   print(pages)

使用 tweepy 游标, #MuniLima 是推特账号, #最初为空的列表,它们以 For 循环开始。存储高音扬声器值:'create_at'、'favourite_count'、'text'

tweeteo=[]
likes=[]
time = []
for tuit in tweepy.Cursor(api.user_timeline,screen_name='MuniLima').items(2870):
    time.append(tuit.created_at)
    likes.append(tuit.favorite_count)
    tweeteo.append(tuit.text)