使用 Python 3.7+ 进行 10 万次 API 调用，使用 asyncio 并行进行 100 次调用

Question

使用 asyncio async/await 和 Python 3.7+ 传递 100k API 调用的最佳方法是什么？这个想法是始终并行使用 100 个任务？

应该避免的是：
1. 开始处理所有 100k 任务
2. 等待所有 100 个并行任务完成，以便安排新一批 100 个任务。

这个例子说明了第一种方法，这不是我们所需要的。

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()

async def main():
    urls = [
            'http://python.org',
            'https://google.com',
            'http://yifei.me'
        ]
    tasks = []
    async with aiohttp.ClientSession() as session:
        for url in urls:
            tasks.append(fetch(session, url))
        htmls = await asyncio.gather(*tasks)
        for html in htmls:
            print(html[:100])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Answer 1

使用semaphore。信号量用于限制并发操作。 Python 的 asyncio 带有它自己的信号量异步版本。

import aiohttp
import asyncio

async def fetch(session, url, sema):
    async with sema, session.get(url) as response:
        return await response.text()

async def main():
    urls = [
            'http://python.org',
            'https://google.com',
            'http://yifei.me',
            'other urls...'
        ]
    tasks = []
    sema = asyncio.BoundedSemaphore(value=100)
    async with aiohttp.ClientSession() as session:
        for url in urls:
            tasks.append(fetch(session, url, sema))
        htmls = await asyncio.gather(*tasks)
        for html in htmls:
            print(html[:100])

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

使用 Python 3.7+ 进行 10 万次 API 调用，使用 asyncio 并行进行 100 次调用

Using Python 3.7+ to make 100k API calls, making 100 in parallel using asyncio

python

python-asyncio

python-3.7