使 Python 异步请求更快
Make Python Async Requests Faster
我正在编写一个获取 id 数组然后为每个 id 发出请求的 get 方法。 ID 数组可能有 500 多个,现在请求需要 20 多分钟。我尝试了几种不同的异步方法,例如 aiohttp 和 async,但它们都没有使请求更快。这是我的代码:
async def get(self):
self.set_header("Access-Control-Allow-Origin", "*")
story_list = []
duplicates = []
loop = asyncio.get_event_loop()
ids = loop.run_in_executor(None, requests.get, 'https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
response = await ids
response_data = response.json()
print(response.text)
for url in response_data:
if url not in duplicates:
duplicates.append(url)
stories = loop.run_in_executor(None, requests.get, "https://hacker-news.firebaseio.com/v0/item/{}.json?print=pretty".format(
url))
data = await stories
if data.status_code == 200 and len(data.text) > 5:
print(data.status_code)
print(data.text)
story_list.append(data.json())
有没有一种方法可以使用多线程来加快请求速度?
这里的主要问题是代码并不是真正的异步。
获得 URL 的列表后,您将一次获取一个,然后等待响应。
一个更好的主意是在执行程序中排队 URL 的 all 之前过滤掉重复项(使用 set
)并等待全部完成,例如:
async def get(self):
self.set_header("Access-Control-Allow-Origin", "*")
stories = []
loop = asyncio.get_event_loop()
# Single executor to share resources
executor = ThreadPoolExecutor()
# Get the initial set of ids
response = await loop.run_in_executor(executor, requests.get, 'https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
response_data = response.json()
print(response.text)
# Putting them in a set will remove duplicates
urls = set(response_data)
# Build the set of futures (returned by run_in_executor) and wait for them all to complete
responses = await asyncio.gather(*[
loop.run_in_executor(
executor, requests.get,
"https://hacker-news.firebaseio.com/v0/item/{}.json?print=pretty".format(url)
) for url in urls
])
# Process the responses
for response in responses:
if response.status_code == 200 and len(response.text) > 5:
print(response.status_code)
print(response.text)
stories.append(response.json())
return stories
我正在编写一个获取 id 数组然后为每个 id 发出请求的 get 方法。 ID 数组可能有 500 多个,现在请求需要 20 多分钟。我尝试了几种不同的异步方法,例如 aiohttp 和 async,但它们都没有使请求更快。这是我的代码:
async def get(self):
self.set_header("Access-Control-Allow-Origin", "*")
story_list = []
duplicates = []
loop = asyncio.get_event_loop()
ids = loop.run_in_executor(None, requests.get, 'https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
response = await ids
response_data = response.json()
print(response.text)
for url in response_data:
if url not in duplicates:
duplicates.append(url)
stories = loop.run_in_executor(None, requests.get, "https://hacker-news.firebaseio.com/v0/item/{}.json?print=pretty".format(
url))
data = await stories
if data.status_code == 200 and len(data.text) > 5:
print(data.status_code)
print(data.text)
story_list.append(data.json())
有没有一种方法可以使用多线程来加快请求速度?
这里的主要问题是代码并不是真正的异步。
获得 URL 的列表后,您将一次获取一个,然后等待响应。
一个更好的主意是在执行程序中排队 URL 的 all 之前过滤掉重复项(使用 set
)并等待全部完成,例如:
async def get(self):
self.set_header("Access-Control-Allow-Origin", "*")
stories = []
loop = asyncio.get_event_loop()
# Single executor to share resources
executor = ThreadPoolExecutor()
# Get the initial set of ids
response = await loop.run_in_executor(executor, requests.get, 'https://hacker-news.firebaseio.com/v0/newstories.json?print=pretty')
response_data = response.json()
print(response.text)
# Putting them in a set will remove duplicates
urls = set(response_data)
# Build the set of futures (returned by run_in_executor) and wait for them all to complete
responses = await asyncio.gather(*[
loop.run_in_executor(
executor, requests.get,
"https://hacker-news.firebaseio.com/v0/item/{}.json?print=pretty".format(url)
) for url in urls
])
# Process the responses
for response in responses:
if response.status_code == 200 and len(response.text) > 5:
print(response.status_code)
print(response.text)
stories.append(response.json())
return stories