尊重 Twitter 的 V2 API 速率限制的简单功能?

Simple function to respect Twitter's V2 API rate limits?

问题:

通常我们希望一次性提取比 Twitter 希望我们提取的数据多得多的数据。在每个查询之间,如果有一个简单的函数可以调用来检查您是否需要等待,那就太好了。

问题:

尊重 Twitter 的 API 限制并确保任何长 运行 查询将成功完成而不会骚扰 Twitter 并确保查询用户不会被封号了?

理想答案:

最理想的答案是应该适用于所有情况的可移植函数。也就是说,无论如何都要(正确地)完成,并遵守 Twitter 的 API 速率限制规则。

警告

I have submitted a working answer of my own but I am unsure if there is a way to improve it.

我正在开发 Python package to utilize Twitter's new V2 API。我想确保我尽可能遵守 Twitter 的速率限制。

下面是两个用来在需要时等待的函数。他们检查 API 调用响应 headers 以获取剩余查询,然后还依赖 Twitter 提供的 here HTTP 代码作为最终备份。据我所知,这三个 HTTP 代码是唯一的 time-related 错误,其他 应该 向 API 用户提出问题以通知他们任何他们做错了。

from datetime import datetime
from osometweet.utils import pause_until

def manage_rate_limits(response):
    """Manage Twitter V2 Rate Limits
    
    This method takes in a `requests` response object after querying
    Twitter and uses the headers["x-rate-limit-remaining"] and
    headers["x-rate-limit-reset"] headers objects to manage Twitter's
    most common, time-dependent HTTP errors.

    """
    while True:

        # Get number of requests left with our tokens
        remaining_requests = int(response.headers["x-rate-limit-remaining"])

        # If that number is one, we get the reset-time
        #   and wait until then, plus 15 seconds.
        # The regular 429 exception is caught below as well,
        #   however, we want to program defensively, where possible.
        if remaining_requests == 1:
            buffer_wait_time = 15
            resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
            print(f"One request from being rate limited. Waiting on Twitter.\n\tResume Time: {resume_time}")
            pause_until(resume_time)

        # Explicitly checking for time dependent errors.
        # Most of these errors can be solved simply by waiting
        # a little while and pinging Twitter again - so that's what we do.
        if response.status_code != 200:

            # Too many requests error
            if response.status_code == 429:
                buffer_wait_time = 15
                resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
                print(f"Too many requests. Waiting on Twitter.\n\tResume Time: {resume_time}")
                pause_until(resume_time)

            # Twitter internal server error
            elif response.status_code == 500:
                # Twitter needs a break, so we wait 30 seconds
                resume_time = datetime.now().timestamp() + 30
                print(f"Internal server error @ Twitter. Giving Twitter a break...\n\tResume Time: {resume_time}")
                pause_until(resume_time)

            # Twitter service unavailable error
            elif response.status_code == 503:
                # Twitter needs a break, so we wait 30 seconds
                resume_time = datetime.now().timestamp() + 30
                print(f"Twitter service unavailable. Giving Twitter a break...\n\tResume Time: {resume_time}")
                pause_until(resume_time)

            # If we get this far, we've done something wrong and should exit
            raise Exception(
                "Request returned an error: {} {}".format(
                    response.status_code, response.text
                )
            )

        # Each time we get a 200 response, exit the function and return the response object
        if response.ok:
            return response

这里是 pause_until 函数。

def pause_until(time):
    """ Pause your program until a specific end time. 'time' is either
    a valid datetime object or unix timestamp in seconds (i.e. seconds
    since Unix epoch) """
    end = time

    # Convert datetime to unix timestamp and adjust for locality
    if isinstance(time, datetime):
        # If we're on Python 3 and the user specified a timezone,
        # convert to UTC and get tje timestamp.
        if sys.version_info[0] >= 3 and time.tzinfo:
            end = time.astimezone(timezone.utc).timestamp()
        else:
            zoneDiff = pytime.time() - (datetime.now() - datetime(1970, 1, 1)).total_seconds()
            end = (time - datetime(1970, 1, 1)).total_seconds() + zoneDiff

    # Type check
    if not isinstance(end, (int, float)):
        raise Exception('The time parameter is not a number or datetime object')

    # Now we wait
    while True:
        now = pytime.time()
        diff = end - now

        #
        # Time is up!
        #
        if diff <= 0:
            break
        else:
            # 'logarithmic' sleeping to minimize loop iterations
            sleep(diff / 2)

这似乎工作得很好,但我不确定是否有 edge-cases 会破坏它,或者是否有更多 elegant/simple 的方法来做到这一点。