并行进程 Stata-Python

Parallel process Stata-Python

我正在尝试为在 Stata .do 文件中执行的 python 函数实现多处理。

在 python 中,我可以执行需要一些时间的简单函数:

import multiprocessing as mp 
from timeit import default_timer as timer

def square(x):
    return x ** x

# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))

# Parallel version
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()  
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))

一旦我尝试 运行 相同的代码但在 STATA .do 文件中它会中断并且 returns 错误:

示例 .do 文件:

python:
import multiprocessing as mp 
from timeit import default_timer as timer

def square(x):
    return x ** x

# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))

# Parallel version
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()  
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))
end

有什么办法可以找到导致错误消息的原因吗?也许还有另一种方法可以在 Stata 环境中使用 Python 进行多处理。

感谢 Stata 支持团队,我能够回答。

在 Windows 上,多处理从头开始生成新进程,而不是分叉。 运行 在嵌入式环境中进行多进程处理时,例如 Stata,需要设置 Python 解释器的路径,以便在启动子进程时使用。

函数必须在单独的文件中定义,此处 my_func.py:


def square(x):
    return x ** x

.do 文件:

python query
di r(execpath)

python:
import multiprocessing as mp
from timeit import default_timer as timer
import platform 
from my_func import square

if platform.platform().find("Windows") >= 0:
        mp.set_executable("`r(execpath)'")

# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))

# Parallel version
if __name__ == '__main__':
        pool = mp.Pool(mp.cpu_count())
        start = timer()
        pool.map(square, [x for x in range(0,1000)])
        pool.close()
        print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))

end