尊重 Twitter 的 V2 API 速率限制的简单功能?
Simple function to respect Twitter's V2 API rate limits?
问题:
通常我们希望一次性提取比 Twitter 希望我们提取的数据多得多的数据。在每个查询之间,如果有一个简单的函数可以调用来检查您是否需要等待,那就太好了。
问题:
尊重 Twitter 的 API 限制并确保任何长 运行 查询将成功完成而不会骚扰 Twitter 并确保查询用户不会被封号了?
理想答案:
最理想的答案是应该适用于所有情况的可移植函数。也就是说,无论如何都要(正确地)完成,并遵守 Twitter 的 API 速率限制规则。
警告
I have submitted a working answer of my own but I am unsure if there is a way to improve it.
我正在开发 Python package to utilize Twitter's new V2 API。我想确保我尽可能遵守 Twitter 的速率限制。
下面是两个用来在需要时等待的函数。他们检查 API 调用响应 headers 以获取剩余查询,然后还依赖 Twitter 提供的 here HTTP 代码作为最终备份。据我所知,这三个 HTTP 代码是唯一的 time-related 错误,其他 应该 向 API 用户提出问题以通知他们任何他们做错了。
from datetime import datetime
from osometweet.utils import pause_until
def manage_rate_limits(response):
"""Manage Twitter V2 Rate Limits
This method takes in a `requests` response object after querying
Twitter and uses the headers["x-rate-limit-remaining"] and
headers["x-rate-limit-reset"] headers objects to manage Twitter's
most common, time-dependent HTTP errors.
"""
while True:
# Get number of requests left with our tokens
remaining_requests = int(response.headers["x-rate-limit-remaining"])
# If that number is one, we get the reset-time
# and wait until then, plus 15 seconds.
# The regular 429 exception is caught below as well,
# however, we want to program defensively, where possible.
if remaining_requests == 1:
buffer_wait_time = 15
resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
print(f"One request from being rate limited. Waiting on Twitter.\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Explicitly checking for time dependent errors.
# Most of these errors can be solved simply by waiting
# a little while and pinging Twitter again - so that's what we do.
if response.status_code != 200:
# Too many requests error
if response.status_code == 429:
buffer_wait_time = 15
resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
print(f"Too many requests. Waiting on Twitter.\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Twitter internal server error
elif response.status_code == 500:
# Twitter needs a break, so we wait 30 seconds
resume_time = datetime.now().timestamp() + 30
print(f"Internal server error @ Twitter. Giving Twitter a break...\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Twitter service unavailable error
elif response.status_code == 503:
# Twitter needs a break, so we wait 30 seconds
resume_time = datetime.now().timestamp() + 30
print(f"Twitter service unavailable. Giving Twitter a break...\n\tResume Time: {resume_time}")
pause_until(resume_time)
# If we get this far, we've done something wrong and should exit
raise Exception(
"Request returned an error: {} {}".format(
response.status_code, response.text
)
)
# Each time we get a 200 response, exit the function and return the response object
if response.ok:
return response
这里是 pause_until
函数。
def pause_until(time):
""" Pause your program until a specific end time. 'time' is either
a valid datetime object or unix timestamp in seconds (i.e. seconds
since Unix epoch) """
end = time
# Convert datetime to unix timestamp and adjust for locality
if isinstance(time, datetime):
# If we're on Python 3 and the user specified a timezone,
# convert to UTC and get tje timestamp.
if sys.version_info[0] >= 3 and time.tzinfo:
end = time.astimezone(timezone.utc).timestamp()
else:
zoneDiff = pytime.time() - (datetime.now() - datetime(1970, 1, 1)).total_seconds()
end = (time - datetime(1970, 1, 1)).total_seconds() + zoneDiff
# Type check
if not isinstance(end, (int, float)):
raise Exception('The time parameter is not a number or datetime object')
# Now we wait
while True:
now = pytime.time()
diff = end - now
#
# Time is up!
#
if diff <= 0:
break
else:
# 'logarithmic' sleeping to minimize loop iterations
sleep(diff / 2)
这似乎工作得很好,但我不确定是否有 edge-cases 会破坏它,或者是否有更多 elegant/simple 的方法来做到这一点。
问题:
通常我们希望一次性提取比 Twitter 希望我们提取的数据多得多的数据。在每个查询之间,如果有一个简单的函数可以调用来检查您是否需要等待,那就太好了。
问题:
尊重 Twitter 的 API 限制并确保任何长 运行 查询将成功完成而不会骚扰 Twitter 并确保查询用户不会被封号了?
理想答案:
最理想的答案是应该适用于所有情况的可移植函数。也就是说,无论如何都要(正确地)完成,并遵守 Twitter 的 API 速率限制规则。
警告
I have submitted a working answer of my own but I am unsure if there is a way to improve it.
我正在开发 Python package to utilize Twitter's new V2 API。我想确保我尽可能遵守 Twitter 的速率限制。
下面是两个用来在需要时等待的函数。他们检查 API 调用响应 headers 以获取剩余查询,然后还依赖 Twitter 提供的 here HTTP 代码作为最终备份。据我所知,这三个 HTTP 代码是唯一的 time-related 错误,其他 应该 向 API 用户提出问题以通知他们任何他们做错了。
from datetime import datetime
from osometweet.utils import pause_until
def manage_rate_limits(response):
"""Manage Twitter V2 Rate Limits
This method takes in a `requests` response object after querying
Twitter and uses the headers["x-rate-limit-remaining"] and
headers["x-rate-limit-reset"] headers objects to manage Twitter's
most common, time-dependent HTTP errors.
"""
while True:
# Get number of requests left with our tokens
remaining_requests = int(response.headers["x-rate-limit-remaining"])
# If that number is one, we get the reset-time
# and wait until then, plus 15 seconds.
# The regular 429 exception is caught below as well,
# however, we want to program defensively, where possible.
if remaining_requests == 1:
buffer_wait_time = 15
resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
print(f"One request from being rate limited. Waiting on Twitter.\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Explicitly checking for time dependent errors.
# Most of these errors can be solved simply by waiting
# a little while and pinging Twitter again - so that's what we do.
if response.status_code != 200:
# Too many requests error
if response.status_code == 429:
buffer_wait_time = 15
resume_time = datetime.fromtimestamp( int(response.headers["x-rate-limit-reset"]) + buffer_wait_time )
print(f"Too many requests. Waiting on Twitter.\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Twitter internal server error
elif response.status_code == 500:
# Twitter needs a break, so we wait 30 seconds
resume_time = datetime.now().timestamp() + 30
print(f"Internal server error @ Twitter. Giving Twitter a break...\n\tResume Time: {resume_time}")
pause_until(resume_time)
# Twitter service unavailable error
elif response.status_code == 503:
# Twitter needs a break, so we wait 30 seconds
resume_time = datetime.now().timestamp() + 30
print(f"Twitter service unavailable. Giving Twitter a break...\n\tResume Time: {resume_time}")
pause_until(resume_time)
# If we get this far, we've done something wrong and should exit
raise Exception(
"Request returned an error: {} {}".format(
response.status_code, response.text
)
)
# Each time we get a 200 response, exit the function and return the response object
if response.ok:
return response
这里是 pause_until
函数。
def pause_until(time):
""" Pause your program until a specific end time. 'time' is either
a valid datetime object or unix timestamp in seconds (i.e. seconds
since Unix epoch) """
end = time
# Convert datetime to unix timestamp and adjust for locality
if isinstance(time, datetime):
# If we're on Python 3 and the user specified a timezone,
# convert to UTC and get tje timestamp.
if sys.version_info[0] >= 3 and time.tzinfo:
end = time.astimezone(timezone.utc).timestamp()
else:
zoneDiff = pytime.time() - (datetime.now() - datetime(1970, 1, 1)).total_seconds()
end = (time - datetime(1970, 1, 1)).total_seconds() + zoneDiff
# Type check
if not isinstance(end, (int, float)):
raise Exception('The time parameter is not a number or datetime object')
# Now we wait
while True:
now = pytime.time()
diff = end - now
#
# Time is up!
#
if diff <= 0:
break
else:
# 'logarithmic' sleeping to minimize loop iterations
sleep(diff / 2)
这似乎工作得很好,但我不确定是否有 edge-cases 会破坏它,或者是否有更多 elegant/simple 的方法来做到这一点。