并行进程 Stata-Python

Question

我正在尝试为在 Stata .do 文件中执行的 python 函数实现多处理。

在 python 中，我可以执行需要一些时间的简单函数：

import multiprocessing as mp 
from timeit import default_timer as timer

def square(x):
    return x ** x

# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))

# Parallel version
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()  
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))

一旦我尝试运行相同的代码但在 STATA .do 文件中它会中断并且 returns 错误：

示例 .do 文件：

python:
import multiprocessing as mp 
from timeit import default_timer as timer

def square(x):
    return x ** x

# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))

# Parallel version
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()  
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))
end

有什么办法可以找到导致错误消息的原因吗？也许还有另一种方法可以在 Stata 环境中使用 Python 进行多处理。

Answer 1

感谢 Stata 支持团队，我能够回答。

在 Windows 上，多处理从头开始生成新进程，而不是分叉。运行在嵌入式环境中进行多进程处理时，例如 Stata，需要设置 Python 解释器的路径，以便在启动子进程时使用。

函数必须在单独的文件中定义，此处 my_func.py:

def square(x):
    return x ** x

.do 文件：

python query
di r(execpath)

python:
import multiprocessing as mp
from timeit import default_timer as timer
import platform 
from my_func import square

if platform.platform().find("Windows") >= 0:
        mp.set_executable("`r(execpath)'")

# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))

# Parallel version
if __name__ == '__main__':
        pool = mp.Pool(mp.cpu_count())
        start = timer()
        pool.map(square, [x for x in range(0,1000)])
        pool.close()
        print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))

end

并行进程 Stata-Python

Parallel process Stata-Python

python

multiprocessing

stata