翻译 r 中的 stata 代码,但结果不同

translate stata codes in r, but the outcomes are different

我目前正在复制一篇论文。作者用的是Stata,我对它了解甚少,所以我必须把它翻译成R。我对以下代码有一个疑问:

use "${directory_data}/income_dataset.dta", clear
reghdfe log_income post61_sc post65_sc male std_score mean_score_class privatista non_bocciato  if tipo_scuola2==1 & laureato==1, vce(cluster liceo_anno) absorb(liceo anno_maturita prov_nasc abilita_anno liceo_anno major_2)
sum log_income if e(sample)
local mean=r(mean)
local mean=round(`mean',.01)
local sd=r(sd)
local sd=round(`sd',.01)
reghdfe log_income post61_sc post65_sc male std_score mean_score_class privatista non_bocciato if tipo_scuola2==1 & laureato==1, vce(cluster liceo_anno) absorb(prov_nasc prov_res_anno liceo_anno abilita_anno major_2)
sum log_income if e(sample)
local mean=r(mean)
local mean=round(`mean',.01)
local sd=r(sd)
local sd=round(`sd',.01)

我假设我将从这两个回归中得到相同的均值和标准差。因为我按相同的条件 tipo_scuola2==1 & laureato==1 对数据进行了子集化。然而,结果却不尽相同。

我对第一个回归的复制是:

income5.1 <-subset(income,tipo_scuola2==1 & laureato==1 & !is.na(income$log_income))
income5.1m <- round(mean(income5.1$log_income),digits = 2)
income5.1s <- round(sd(income5.1$log_income),digits = 2)
tb5.1 <- felm(log_income ~ post61_sc+ post65_sc+ male+ std_score+ mean_score_class+ privatista +non_bocciato|liceo +anno_maturita +prov_nasc +abilita_anno +liceo_anno+ major_2|0|liceo_anno,data = income5.1)

如果不查看数据(使用来自 SSC 的 dataex -- ssc install dataex 创建示例数据集),以下内容无法验证:

可能会出现差异,因为 reghdfe drops singletons (groups with only one observation), while felm does not. So the sample you are using in R and Stata are not the same, creating differences you observe. Note also that, even when using the same data reghdfe and felm have different methods for computing clustered standard errors. There are several deeper discussions of this issue on Github