使用 "wait_on_rate_limit" 参数获取背靠背错误

Getting back to back error using "wait_on_rate_limit" parameter

为了避免速率限制错误我使用了参数:

wait_on_rate_limit

函数中

api = tweepy.API(auth,wait_on_rate_limit=True,wait_on_rate_limit_notify=True)

起初我的程序运行良好。当我超过速率限制时,我收到消息:
"Rate limit reached. Sleeping for: 909"。我的程序休眠了这段时间,然后我的程序继续收集数据。在某些时候,我遇到了一些背靠背的错误。

...
ConnectionResetError: [WinError 10054] An existing connection was forcibly closed by the remote host

During handling of the above exception, another exception occurred:

...
urllib3.exceptions.ProtocolError: ('Connection aborted.', 
ConnectionResetError(10054, 'An existing connection was forcibly closed by 
the remote host', None, 10054, None))

During handling of the above exception, another exception occurred:

...
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

During handling of the above exception, another exception occurred:

...
tweepy.error.TweepError: Failed to send request: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))

我的代码:

for user in tweepy.Cursor(api.friends, id="twitter").items():
    friendsOfUser=user.screen_name
    ## Do something with friendsOfUser

有什么我可以做的吗?

你不能对主机关闭连接这一事实做任何事情。如果您正在等待速率限制,我敢打赌您在使用 API 方面有点激进 :) 尝试捕获 TweepError 并明确等待一段时间,然后他们再试一次。

您可以尝试这样的操作:

import time

...
try:
    for user in tweepy.Cursor(api.friends, id="twitter").items():
        friendsOfUser=user.screen_name
        ...
except tweepy.TweepError:
    time.sleep(120) # sleep for 2 minutes. You may try different time

这对我有用:

    backoff_counter = 1
    while True:
        try:
            for user in tweepy.Cursor(api.friends, id="twitter").items():
                # do something with user
            break
        except tweepy.TweepError as e:
            print(e.reason)
            sleep(60*backoff_counter)
            backoff_counter += 1
            continue

基本上,当您遇到错误时,您会睡一会儿,然后重试。我使用增量退避来确保休眠时间足以重新建立连接。

为避免这种情况,您可以在每次请求后添加超时。我使用的脚本每 15 分钟只允许 15 个请求,所以我每分钟发出一个请求并最大化数据。

for page in tweepy.Cursor(api.followers, screen_name=user_name, wait_on_rate_limit=True, count=200).pages():
try:
    followers.extend(page)
    print("-->", len(followers))
    if len(followers) % 100 == 0:
        save_followers_to_csv(user_name, followers)
    time.sleep(60)
except tweepy.TweepError as e:
    print("Going to sleep:", e)
    time.sleep(60)