Return 从 Stata Do 文件到 Python 的值

Question

我可以从 Python 成功调用 Stata Do 文件，但是从 Stata 获取本地宏到 Python 的最佳方法是什么？我打算在 Python.

中循环使用 Do 文件

我目前拥有的：

Python:

import subprocess

InputParams = [' -3','0',' -3','0',' -3','0']

# /e makes it run quietly, i.e., Stata doesn't open a window
cmd = ['C:\Program Files (x86)\Stata14\StataMP-64.exe','/e','do',dofile] + InputParams
subprocess.call(cmd,shell=True)

并且在 Stata 中我运行回归并得到一个包含均方误差的局部宏，比如

local MSE = 0.0045

将本地宏 return 转换为 Python 的最佳方法是什么？写入文件？我找不到任何关于将宏写入文件的内容。

加分题：如果我把 InputParams = ['-3' , '0']放在Python中（我去掉了负三前面的space），Stata报错/3 invalid name，为什么？

编辑

添加 Stata Do 文件。这不是实际的脚本，它只是我在真实脚本中执行的操作的表示。

quietly {

capture log close
clear all
cls
version 14.2
set more off
cd "<path here>"
local datestamp: di %tdCCYY-NN-DD daily("$S_DATE","DMY")
local timestamp = subinstr("$S_TIME",":","-",2)
log using "Logs\log_`datestamp'_`timestamp'_UTC.log"
set matsize 10000

use "<dataset path here>"

gen date = dofc(TimeVar)
encode ID, generate(uuid)

xtset uuid date

gen double DepVarLagSum = 0
gen double IndVar1LagMax = 0
gen double IndVar2LagMax = 0

local DepVar1LagStart = `1' // INPUT PARAMS GO HERE
local DepVar1LagEnd = `2'
local IndVar1LagStart = `3' 
local IndVar1LagEnd = `4'
local IndVar2Start = `5'
local IndVar2End = `6'

** number of folds for cross validation
scalar kfold = 5
set seed 42
gen byte randint = runiform(1,kfold)

** thanks to Álvaro A. Gutiérrez-Vargas for the matrix operations
matrix results = J(kfold,4,.)
matrix colnames results = "R2_fold" "MSE_fold" "R2_hold" "MSE_hold"
matrix rownames results = "1" "2" "3" "4" "5"

local MSE = 0

** rolling sum, thanks to Nick Cox for the algorithm
forval k = `DepVarLagStart'(1)`DepVar1agEnd' {
    if `k' < 0 {
        local k1 = -(`k')
        replace DepVarLagSum = DepVarLagSum + L`k1'.DepVar
    }
    else replace DepVarLagSum = DepVarLagSum + F`k'.DepVar
}

** rolling max, thanks to Nick Cox for the algorithm
local IndVar1_arg IndVar1 
forval k = `IndVar1LagStart'(1)`IndVar1LagEnd' {
    if `k' <= 0 {
        local k1 = -(`k')
        local IndVar1_arg `IndVar1_arg', L`k1'.IndVar1
    }    
}

local IndVar2_arg IndVar2 
forval k = `IndVar2LagStart'(1)`IndVar2LagEnd' {
    if `k' <= 0 {
        local k1 = -(`k')
        local IndVar2_arg `IndVar2_arg', L`k1'.IndVar2
    }    
}

gen resid_squared = .

forval b = 1(1)`=kfold' {
    ** perform regression on 4/5 parts
    xtreg c.DepVarLagSum ///
    c.IndVar1LagMax ///
    c.IndVar2LagMax ///
    if randint != `b' ///
    , fe vce(cluster uuid)

    ** store results
    matrix results[`b',1] = e(r2)
    matrix results[`b',2] = e(rmse)*e(rmse) // to get MSE
    
    ** test set
    predict predDepVarLagSum if randint == `b', xb
    predict residDepVarLagSum if randint == `b', residuals
   

    ** get R-squared
    corr DepVarLagSum predDepVarLagSum if randint == `b'
    matrix results[`b',3] = r(rho)^2
 
    ** calculate squared residuals
    replace resid_squared = residDepVarLagSum*residDepVarLagSum
    summarize resid_squared if randint == `b'
    matrix results[`b',4] = r(mean)

    drop predDepVarLagSum
    drop residDepVarLagSum

mat U = J(rowsof(results),1,1)
mat sum = U'*results
mat mean_results = sum/rowsof(results)

local MSE = mean_results[1,4]
}
}

我想将 MSE 送回 Python。

抱歉，如果我遗漏了一些小错别字，我无法直接从我运行ning Stata 所在的机器上复制代码。

想法是提供输入参数以确定滞后期，运行基于新变量的回归，得到平均测试集均方误差，将其反馈回 Python。

编辑 2

我在 InputParams 列表中添加了更多项目，以反映 Stata Do 文件的预期输入数量。

Answer 1

Better integration between Python and Stata is available in Stata 16.1，但适用于早期版本的实用解决方案是将结果写入磁盘 Stata 矩阵（此处我使用的是 Excel 文件），然后从 Python。这里有一个代码行示例，您可以将其放在 dofile 的末尾以编写所需的矩阵。

clear all
version 14.1
matrix M = J(5,2,999)
matrix colnames M = "col1"  "col2" 
matrix rownames M ="1" "2" "3" "4" "5"
global route = "C:\Users\route_to_your_working_directory"
putexcel set "${route}\M.xlsx", sheet("M")  replace
putexcel A1 = matrix(M)   , names

Return 从 Stata Do 文件到 Python 的值

Return value from Stata Do-file to Python

python

stata