将异步函数应用于 pandas 数据帧的最快方法

fastest way to apply an async function to pandas dataframe

pandas dataframe 中有一个 apply 方法允许应用一些同步功能,例如:

import numpy as np
import pandas as pd

def fun(x):
    return x * 2

df = pd.DataFrame(np.arange(10), columns=['old'])

df['new'] = df['old'].apply(fun)

如果必须应用异步函数 fun2,那么做类似事情的最快方法是什么:

import asyncio
import numpy as np
import pandas as pd

async def fun2(x):
    return x * 2

async def main():
    df = pd.DataFrame(np.arange(10), columns=['old'])
    df['new'] = 0    
    for i in range(len(df)):
        df['new'].iloc[i] = await fun2(df['old'].iloc[i])
    print(df)

asyncio.run(main())

尝试 asyncio.gather 并在完成后覆盖整列:

import asyncio
import numpy as np
import pandas as pd


async def fun2(x):
    return x * 2


async def main():
    df = pd.DataFrame(np.arange(10), columns=['old'])
    df['new'] = await asyncio.gather(*[fun2(v) for v in df['old']])
    print(df)


asyncio.run(main())

输出:

   old  new
0    0    0
1    1    2
2    2    4
3    3    6
4    4    8
5    5   10
6    6   12
7    7   14
8    8   16
9    9   18