python 中令人尴尬的并行问题

Question

我有 634 个 *.npy 文件，每个文件都包含一个形状为 (8194, 76) 的二维 numpy 数组。我想以不同的频率对每列使用 STL 分解五次。所以我要做的是：

for file in files:
    for column in columns:
        for freq in frequencies:
            res = STL(file[:,column], period = freq)
            decomposed = np.vstack((res.trend, res.seasonal, res.resid)).T
    np.save(decompoesd)

最终分解后的形状应该是(8194,1140)。我怎样才能并行化这个？因为在串行实施中运行需要 2 个多月的时间。

Answer 1

你可以这样做：

from concurrent.futures import ProcessPoolExecutor


FILES = ["a", "b", "c", "d", "e", "f", "g", "h"]


def simulate_cpu_bound(file):
    2 ** 100000000  # cpu heavy task
    # or just use time.sleep(n), where n - number of seconds
    return file


if __name__ == '__main__':
    with ProcessPoolExecutor(8) as f:
        res = f.map(simulate_cpu_bound, FILES)

    res = list(res)

    print(res)

python 中令人尴尬的并行问题

embarrassingly parallel problem in python

python

parallel-processing

concurrency

multiprocessing