如何自动将最大数量的字符串分配给宏?
How to assign the maximum amount of strings to macro automatically?
我的问题标题可能有点含糊。
以前,我想"acquire complete list of subdirs"然后将这些子目录中的文件读入Stata(参见 and )。
多亏了@Roberto Ferrer 的好建议,我差一点就做到了。但是那时我遇到了另一个问题。因为我有这么多单独的文件,本地宏的长度似乎达到了上限。命令后local n: word count
Stata发送错误信息:
macro substitution results in line that is too long.
The line resulting from substituting macros would be longer than allowed. The maximum allowed length is 645,216 characters, which is calculated on the basis of set maxvar. You can change that in Stata/SE and Stata/MP. What follows is relevant only if you are using Stata/SE or Stata/MP.
The maximum line length is defined as 16 more than the maximum macro length, which is currently 645,200 characters. Each unit increase in set maxvar increases the length maximums by 129.The maximum value of set maxvar is 32,767. Thus, the maximum line length may be set up to 4,227,159 characters if you set maxvar to its largest value.
r(920);
当我将子目录的数量减少到 5 个时,Stata 工作正常。由于有大约 100 个子目录,我想将这些操作复制 20 次。好吧,这是可以管理的,但我仍然想知道我是否可以 完全自动化这个过程 ,更具体地说,到 "exhaust" 最大允许宏长度,导入文件并添加另一个下一次子目录组。
下面你可以找到我的代码:
//====================================
//=== read and clean projects data ===
//====================================
version 14
set linesize 80
set more off
clear
macro drop _all
set linesize 200
cd G:\Data_backup\Soufang_data
*----------------------------------
* Read all files within dictionary
*----------------------------------
* Import the first worksheets 1:"项目首页" 2:"项目概况" 3:"成交详情"
* worksheet1
filelist, directory("G:\Data_backup\Soufang_data") pattern(*.xlsx)
* Add pattern(*.xlsx) provent importing add file type( .doc or .dta)
gen tag = substr(reverse(dirname),1,6) == "esuoh/"
keep if tag==1
gen path = dirname+"\"+filename
qui valuesof path if tag==1
local filelist = r(values)
split dirname, parse("\" "/")
ren dirname4 citylist
drop dirname1-dirname3 dirname5
qui valuesof citylist if tag==1
local city = r(values)
local count = 1
local n:word count `filelist'
forval i = 1/`n' {
local file : word `i' of `filelist'
local cityname: word `i' of `city'
** don't add xlsx after `file', suffix has been added
** write "`file'" rather than `file', I don't know why but it works
qui import excel using "`file'",clear
cap qui sxpose,clear
cap qui drop in 1/1
gen city = "`cityname'"
if `count'==1 {
save house.dta,replace emptyok
}
else {
qui append using house
qui save house.dta,replace emptyok
}
local ++count
}
谢谢。
您不需要将整个文件列表存储在宏中。 filelist
创建一个包含您要使用的文件的数据库。只需保存它并为您要处理的每个文件重新加载它。您还使用了一种非常低效的方法来附加数据集。随着附加数据集的增长,重新加载和保存它的成本变得非常高,并且可能会使整个过程变慢。
下面是如何处理您的 Excel 文件的草图
filelist, directory(".") pattern(*.xlsx)
save "myfiles.dta", replace
local n = _N
forval i = 1/`n' {
use in `i' using "myfiles.dta", clear
local f = dirname + "/" + filename
qui import excel using "`f'",clear
tempfile res`i'
save "`res`i''"
}
clear
forval i = 1/`n' {
append using "`res`i''"
}
save "final.dta", replace
我的问题标题可能有点含糊。
以前,我想"acquire complete list of subdirs"然后将这些子目录中的文件读入Stata(参见
多亏了@Roberto Ferrer 的好建议,我差一点就做到了。但是那时我遇到了另一个问题。因为我有这么多单独的文件,本地宏的长度似乎达到了上限。命令后local n: word count
Stata发送错误信息:
macro substitution results in line that is too long.
The line resulting from substituting macros would be longer than allowed. The maximum allowed length is 645,216 characters, which is calculated on the basis of set maxvar. You can change that in Stata/SE and Stata/MP. What follows is relevant only if you are using Stata/SE or Stata/MP.
The maximum line length is defined as 16 more than the maximum macro length, which is currently 645,200 characters. Each unit increase in set maxvar increases the length maximums by 129.The maximum value of set maxvar is 32,767. Thus, the maximum line length may be set up to 4,227,159 characters if you set maxvar to its largest value.
r(920);
当我将子目录的数量减少到 5 个时,Stata 工作正常。由于有大约 100 个子目录,我想将这些操作复制 20 次。好吧,这是可以管理的,但我仍然想知道我是否可以 完全自动化这个过程 ,更具体地说,到 "exhaust" 最大允许宏长度,导入文件并添加另一个下一次子目录组。
下面你可以找到我的代码:
//====================================
//=== read and clean projects data ===
//====================================
version 14
set linesize 80
set more off
clear
macro drop _all
set linesize 200
cd G:\Data_backup\Soufang_data
*----------------------------------
* Read all files within dictionary
*----------------------------------
* Import the first worksheets 1:"项目首页" 2:"项目概况" 3:"成交详情"
* worksheet1
filelist, directory("G:\Data_backup\Soufang_data") pattern(*.xlsx)
* Add pattern(*.xlsx) provent importing add file type( .doc or .dta)
gen tag = substr(reverse(dirname),1,6) == "esuoh/"
keep if tag==1
gen path = dirname+"\"+filename
qui valuesof path if tag==1
local filelist = r(values)
split dirname, parse("\" "/")
ren dirname4 citylist
drop dirname1-dirname3 dirname5
qui valuesof citylist if tag==1
local city = r(values)
local count = 1
local n:word count `filelist'
forval i = 1/`n' {
local file : word `i' of `filelist'
local cityname: word `i' of `city'
** don't add xlsx after `file', suffix has been added
** write "`file'" rather than `file', I don't know why but it works
qui import excel using "`file'",clear
cap qui sxpose,clear
cap qui drop in 1/1
gen city = "`cityname'"
if `count'==1 {
save house.dta,replace emptyok
}
else {
qui append using house
qui save house.dta,replace emptyok
}
local ++count
}
谢谢。
您不需要将整个文件列表存储在宏中。 filelist
创建一个包含您要使用的文件的数据库。只需保存它并为您要处理的每个文件重新加载它。您还使用了一种非常低效的方法来附加数据集。随着附加数据集的增长,重新加载和保存它的成本变得非常高,并且可能会使整个过程变慢。
下面是如何处理您的 Excel 文件的草图
filelist, directory(".") pattern(*.xlsx)
save "myfiles.dta", replace
local n = _N
forval i = 1/`n' {
use in `i' using "myfiles.dta", clear
local f = dirname + "/" + filename
qui import excel using "`f'",clear
tempfile res`i'
save "`res`i''"
}
clear
forval i = 1/`n' {
append using "`res`i''"
}
save "final.dta", replace