并行进程 Stata-Python
Parallel process Stata-Python
我正在尝试为在 Stata .do
文件中执行的 python 函数实现多处理。
在 python 中,我可以执行需要一些时间的简单函数:
import multiprocessing as mp
from timeit import default_timer as timer
def square(x):
return x ** x
# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))
# Parallel version
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))
一旦我尝试 运行 相同的代码但在 STATA .do
文件中它会中断并且 returns 错误:
示例 .do
文件:
python:
import multiprocessing as mp
from timeit import default_timer as timer
def square(x):
return x ** x
# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))
# Parallel version
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))
end
有什么办法可以找到导致错误消息的原因吗?也许还有另一种方法可以在 Stata 环境中使用 Python 进行多处理。
感谢 Stata 支持团队,我能够回答。
在 Windows 上,多处理从头开始生成新进程,而不是分叉。 运行 在嵌入式环境中进行多进程处理时,例如 Stata,需要设置 Python 解释器的路径,以便在启动子进程时使用。
函数必须在单独的文件中定义,此处 my_func.py:
def square(x):
return x ** x
.do
文件:
python query
di r(execpath)
python:
import multiprocessing as mp
from timeit import default_timer as timer
import platform
from my_func import square
if platform.platform().find("Windows") >= 0:
mp.set_executable("`r(execpath)'")
# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))
# Parallel version
if __name__ == '__main__':
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))
end
我正在尝试为在 Stata .do
文件中执行的 python 函数实现多处理。
在 python 中,我可以执行需要一些时间的简单函数:
import multiprocessing as mp
from timeit import default_timer as timer
def square(x):
return x ** x
# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))
# Parallel version
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))
一旦我尝试 运行 相同的代码但在 STATA .do
文件中它会中断并且 returns 错误:
示例 .do
文件:
python:
import multiprocessing as mp
from timeit import default_timer as timer
def square(x):
return x ** x
# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))
# Parallel version
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))
end
有什么办法可以找到导致错误消息的原因吗?也许还有另一种方法可以在 Stata 环境中使用 Python 进行多处理。
感谢 Stata 支持团队,我能够回答。
在 Windows 上,多处理从头开始生成新进程,而不是分叉。 运行 在嵌入式环境中进行多进程处理时,例如 Stata,需要设置 Python 解释器的路径,以便在启动子进程时使用。
函数必须在单独的文件中定义,此处 my_func.py:
def square(x):
return x ** x
.do
文件:
python query
di r(execpath)
python:
import multiprocessing as mp
from timeit import default_timer as timer
import platform
from my_func import square
if platform.platform().find("Windows") >= 0:
mp.set_executable("`r(execpath)'")
# Non-parallel
start = timer()
[square(x) for x in range(0,1000)]
print("Simple execution took {:.2f} seconds".format(timer()-start))
# Parallel version
if __name__ == '__main__':
pool = mp.Pool(mp.cpu_count())
start = timer()
pool.map(square, [x for x in range(0,1000)])
pool.close()
print("Multiprocessing execution took {:.2f} seconds".format(timer()-start))
end