反复记录线性回归结果
record linear regression results repeatly
如下例所示,我要实现的是运行多次回归,每次R记录did[=20的估计=]合一data.frame。
每次,我都会更改 "ifelse" 中的年份条件,即 ifelse(mydata$year >= 1993, 1, 0),因此每次我 运行 都会进行不同的回归。
mydata$time = ifelse(mydata$year >= 1994, 1, 0)
有人可以帮忙吗?我的基本代码如下(如果R返回错误,可以通过浏览器下载数据):
library(foreign)
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta")
mydata$time = ifelse(mydata$year >= 1994, 1, 0)
mydata$did = mydata$time * mydata$treated
mydata$treated = ifelse(mydata$country == "E" | mydata$country == "F" | mydata$country == "G", 1, 0)
didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)
一般来说,如果您想每次使用不同的输入多次重复一个过程,您需要一个函数。以下函数将标量值 year_value
作为其输入,为回归创建局部变量并导出模型项 did
.
的估计值
foo <- function (year_value) {
## create local variables from `mydata`
y <- mydata$y
treated <- as.numeric(mydata$country %in% c("E", "F", "G")) ## use `%in%`
time <- as.numeric(mydata$year >= year_value) ## use `year_value`
did <- time * treated
## run regression using local variables
didreg <- lm(y ~ treated + time + did)
## return estimate for model term `did`
coef(summary(didreg))["did", ]
}
foo(1993)
# Estimate Std. Error t value Pr(>|t|)
#-2.784222e+09 1.504349e+09 -1.850782e+00 6.867661e-02
请注意,您的原始代码有几个地方可以改进。比如说,,并使用 as.numeric
而不是 ifelse
将布尔值强制转换为数字。
现在您需要类似循环的东西来在几个不同的 year_value
上迭代此函数。我会使用 lappy
.
## raw list of result from `lapply`
year_of_choice <- 1993:1994 ## taken for example
result <- lapply(year_of_choice, foo)
## rbind them into a matrix
data.frame(year = year_of_choice, do.call("rbind", result), check.names = FALSE)
# year Estimate Std. Error t value Pr(>|t|)
#1 1993 -2784221881 1504348732 -1.850782 0.06867661
#2 1994 -2519511630 1455676087 -1.730819 0.08815711
注意,不要选择1990年(变量year
的最小值),否则time
会是一个1的向量,和截距一样。生成的模型排名不足,您将收到 "subscript out of bounds" 错误。自 3.5.0 起的 R 版本有一个新的 complete
泛型函数参数 coef
。所以为了稳定我们可以使用
coef(summary(didreg), complete = TRUE)["did", ]
但是您应该看到 1990 年的所有 NA
或 NaN
。
这里是另一种选择,这里我们为所有年份创建一个矩阵,将其加入 mydata,收集到 long,通过分组嵌套,然后 运行 回归以提取估计值。请注意 "gt_et_**" 代表 "greater than or equal to.."
library(foreign)
library(dplyr)
library(tidyr)
library(purrr)
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta")
mtrx <- matrix(0, length(min(mydata$year):max(mydata$year)), length(min(mydata$year):max(mydata$year)))
mtrx[lower.tri(mtrx, diag = TRUE)] <- 1
df <- mtrx %>% as.data.frame() %>% mutate(year = min(mydata$year):max(mydata$year))
colnames(df) <- c(paste0("gt_et_", df$year), "year")
models <- df %>%
full_join(., mydata, by = "year") %>%
gather(mod, time, gt_et_1990:gt_et_1999) %>%
nest(-mod) %>%
mutate(data = map(data, ~mutate(.x, treated = ifelse(country == "E"|country == "F"|country == "G", 1, 0),
did = time * treated)),
mods = map(data, ~lm(y ~ treated + time + did, data = .x) %>% summary() %>% coef())) %>%
unnest(mods %>% map(broom::tidy)) %>%
filter(.rownames == "did") %>%
select(-.rownames)
models
#> mod Estimate Std..Error t.value Pr...t..
#> 1 gt_et_1991 -2309823993 2410140350 -0.95837738 0.34137018
#> 2 gt_et_1992 -2036098728 1780081308 -1.14382344 0.25682856
#> 3 gt_et_1993 -2784221881 1504348732 -1.85078222 0.06867661
#> 4 gt_et_1994 -2519511630 1455676087 -1.73081886 0.08815711
#> 5 gt_et_1995 -2357323806 1455203186 -1.61992760 0.11001662
#> 6 gt_et_1996 250180589 1511322882 0.16553749 0.86902697
#> 7 gt_et_1997 405842197 1619653548 0.25057346 0.80292231
#> 8 gt_et_1998 -75683039 1852314277 -0.04085864 0.96753194
#> 9 gt_et_1999 2951694230 2452126428 1.20372840 0.23299421
由 reprex 创建于 2018-09-01
包 (v0.2.0).
如下例所示,我要实现的是运行多次回归,每次R记录did[=20的估计=]合一data.frame。
每次,我都会更改 "ifelse" 中的年份条件,即 ifelse(mydata$year >= 1993, 1, 0),因此每次我 运行 都会进行不同的回归。
mydata$time = ifelse(mydata$year >= 1994, 1, 0)
有人可以帮忙吗?我的基本代码如下(如果R返回错误,可以通过浏览器下载数据):
library(foreign)
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta")
mydata$time = ifelse(mydata$year >= 1994, 1, 0)
mydata$did = mydata$time * mydata$treated
mydata$treated = ifelse(mydata$country == "E" | mydata$country == "F" | mydata$country == "G", 1, 0)
didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)
一般来说,如果您想每次使用不同的输入多次重复一个过程,您需要一个函数。以下函数将标量值 year_value
作为其输入,为回归创建局部变量并导出模型项 did
.
foo <- function (year_value) {
## create local variables from `mydata`
y <- mydata$y
treated <- as.numeric(mydata$country %in% c("E", "F", "G")) ## use `%in%`
time <- as.numeric(mydata$year >= year_value) ## use `year_value`
did <- time * treated
## run regression using local variables
didreg <- lm(y ~ treated + time + did)
## return estimate for model term `did`
coef(summary(didreg))["did", ]
}
foo(1993)
# Estimate Std. Error t value Pr(>|t|)
#-2.784222e+09 1.504349e+09 -1.850782e+00 6.867661e-02
请注意,您的原始代码有几个地方可以改进。比如说,as.numeric
而不是 ifelse
将布尔值强制转换为数字。
现在您需要类似循环的东西来在几个不同的 year_value
上迭代此函数。我会使用 lappy
.
## raw list of result from `lapply`
year_of_choice <- 1993:1994 ## taken for example
result <- lapply(year_of_choice, foo)
## rbind them into a matrix
data.frame(year = year_of_choice, do.call("rbind", result), check.names = FALSE)
# year Estimate Std. Error t value Pr(>|t|)
#1 1993 -2784221881 1504348732 -1.850782 0.06867661
#2 1994 -2519511630 1455676087 -1.730819 0.08815711
注意,不要选择1990年(变量year
的最小值),否则time
会是一个1的向量,和截距一样。生成的模型排名不足,您将收到 "subscript out of bounds" 错误。自 3.5.0 起的 R 版本有一个新的 complete
泛型函数参数 coef
。所以为了稳定我们可以使用
coef(summary(didreg), complete = TRUE)["did", ]
但是您应该看到 1990 年的所有 NA
或 NaN
。
这里是另一种选择,这里我们为所有年份创建一个矩阵,将其加入 mydata,收集到 long,通过分组嵌套,然后 运行 回归以提取估计值。请注意 "gt_et_**" 代表 "greater than or equal to.."
library(foreign)
library(dplyr)
library(tidyr)
library(purrr)
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta")
mtrx <- matrix(0, length(min(mydata$year):max(mydata$year)), length(min(mydata$year):max(mydata$year)))
mtrx[lower.tri(mtrx, diag = TRUE)] <- 1
df <- mtrx %>% as.data.frame() %>% mutate(year = min(mydata$year):max(mydata$year))
colnames(df) <- c(paste0("gt_et_", df$year), "year")
models <- df %>%
full_join(., mydata, by = "year") %>%
gather(mod, time, gt_et_1990:gt_et_1999) %>%
nest(-mod) %>%
mutate(data = map(data, ~mutate(.x, treated = ifelse(country == "E"|country == "F"|country == "G", 1, 0),
did = time * treated)),
mods = map(data, ~lm(y ~ treated + time + did, data = .x) %>% summary() %>% coef())) %>%
unnest(mods %>% map(broom::tidy)) %>%
filter(.rownames == "did") %>%
select(-.rownames)
models
#> mod Estimate Std..Error t.value Pr...t..
#> 1 gt_et_1991 -2309823993 2410140350 -0.95837738 0.34137018
#> 2 gt_et_1992 -2036098728 1780081308 -1.14382344 0.25682856
#> 3 gt_et_1993 -2784221881 1504348732 -1.85078222 0.06867661
#> 4 gt_et_1994 -2519511630 1455676087 -1.73081886 0.08815711
#> 5 gt_et_1995 -2357323806 1455203186 -1.61992760 0.11001662
#> 6 gt_et_1996 250180589 1511322882 0.16553749 0.86902697
#> 7 gt_et_1997 405842197 1619653548 0.25057346 0.80292231
#> 8 gt_et_1998 -75683039 1852314277 -0.04085864 0.96753194
#> 9 gt_et_1999 2951694230 2452126428 1.20372840 0.23299421
由 reprex 创建于 2018-09-01 包 (v0.2.0).