文件的异步下载

Asynchronous Download of Files

这会从数据库下载更新的 fasta 文件(蛋白质序列),与 requests 相比,我使用 asyncio 可以更快地工作,但是我不相信下载实际上是异步发生。

import os
import aiohttp
import aiofiles
import asyncio

folder = '~/base/fastas/proteomes/'

upos = {'UP000005640': 'Human_Homo_sapien',
        'UP000002254': 'Dog_Boxer_Canis_Lupus_familiaris',
        'UP000002311': 'Yeast_Saccharomyces_cerevisiae',
        'UP000000589': 'Mouse_Mus_musculus',
        'UP000006718': 'Monkey_Rhesus_macaque_Macaca_mulatta',
        'UP000009130': 'Monkey_Cynomolgus_Macaca_fascicularis',
        'UP000002494': 'Rat_Rattus_norvegicus',
        'UP000000625': 'Escherichia_coli',
        }

#https://www.uniprot.org/uniprot/?query=proteome:UP000005640&format=fasta Example link
startline = r'https://www.uniprot.org/uniprot/?query=proteome:'
endline = r'&format=fasta&include=False' #include is true to include isoforms, make false for only canonical sequences

async def fetch(session, link, folderlocation, name):
    async with session.get(link, timeout=0) as response:
        try:
            file = await aiofiles.open(folderlocation, mode='w')
            file.write(await response.text())
            await file.close()
            print(name, 'ended')
        except FileNotFoundError:
            loc = ''.join((r'/'.join((folderlocation.split('/')[:-1])), '/'))
            command = ' '.join(('mkdir -p', loc))
            os.system(command)
            file = await aiofiles.open(folderlocation, mode='w')
            file.write(await response.text())
            await file.close()
            print(name, 'ended')

async def rfunc():
    async with aiohttp.ClientSession() as session:
        for upo, name in upos.items():
            print(name, 'started')
            link = ''.join((startline, upo, endline))
            folderlocation =''.join((folder, name, '.fasta'))
            await fetch(session, link, folderlocation, name)

loop = asyncio.get_event_loop()
loop.run_until_complete(rfunc())

我从 运行 输出:

In [5]: runfile('~/base/Fasta Proteome Updater.py')
Human_Homo_sapien started
Human_Homo_sapien ended
Dog_Boxer_Canis_Lupus_familiaris started
Dog_Boxer_Canis_Lupus_familiaris ended
Yeast_Saccharomyces_cerevisiae started
Yeast_Saccharomyces_cerevisiae ended
Mouse_Mus_musculus started
Mouse_Mus_musculus ended
Monkey_Rhesus_macaque_Macaca_mulatta started
Monkey_Rhesus_macaque_Macaca_mulatta ended
Monkey_Cynomolgus_Macaca_fascicularis started
Monkey_Cynomolgus_Macaca_fascicularis ended
Rat_Rattus_norvegicus started
Rat_Rattus_norvegicus ended
Escherichia_coli started
Escherichia_coli ended

打印的输出似乎表示一次下载一个,这里有什么问题吗?

您正在循环下载项目并等待 (await) 每个项目完成。为了使它们同时发生,您需要安排所有下载一次执行 - 例如使用 gather.

那么您的代码可能如下所示:

async def rfunc():
    async with aiohttp.ClientSession() as session:
        await gather(
            *[
                 fetch(
                     session,
                     ''.join((startline, upo, endline)),
                     ''.join((folder, name, '.fasta')),
                     name,
                 ) for upo, name in upos.items()
             ]
        )


loop = asyncio.get_event_loop()
loop.run_until_complete(rfunc())