根据 P 值绘制滚动系数和颜色
Plot rolling coefficients and color based on P-Value
这有点棘手!我是 运行 滚动 window 回归,我正在收集每个 window 的所有系数。我的目标是绘制系数如何随时间波动。此外,我希望通过在不显着时给出不同的颜色点来在发现系数具有统计显着性(比如 95%)时给出不同的颜色。
我目前拥有的是:
library(plm)
coeff<-NULL
for(e in 1:39){ #44 years total for each country
paneldata<-pdata.frame(
rbind(
subset(LaggedPannel,Country=="A")[(e):(e+5),],
subset(LaggedPannel,Country=="B")[(e):(e+5),]),
index=c("Country","Year")) #we made our new windowed panel frame
coef<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,1] #gets the coeff from a panel regression
coeff<-c(coeff,coef) #store coeffs
}
plot(coeff,type="b",col="red")
情节产生了:
例如,假设第二个和第四个系数(图中的项目符号)在统计上不显着;那么他们的颜色应该是绿色的。
Data (LaggedPannel)
:
Age1 Age2 Age3
Australia-1973 261.156 255.699 249.954
Australia-1974 261.305 255.394 251.470
Australia-1975 258.160 253.543 250.538
Australia-1976 262.504 258.066 254.720
Australia-1977 240.086 260.846 258.418
Australia-1978 228.774 238.871 259.449
USA-1973 4100.257 4104.028 4107.409
USA-1974 4135.435 4118.422 4120.286
USA-1975 4171.648 4164.065 4134.525
USA-1976 4208.236 4187.196 4171.167
USA-1977 4240.832 4211.655 4189.650
USA-1978 4286.923 4255.092 4229.701
这里是一些模拟数据。
library(tidyverse)
library(broom)
simfun <- function(a=0.1,B=0.05,n=200,x.sd=1,e.sd=1) {
x <- rnorm(n, mean=0, sd=x.sd) + runif(100)
e <- rnorm(n, mean=0, sd=e.sd)
y <- a+B*x+e
data.frame(x,y)
}
statfun <- function(d) {
summary(lm(y~x,data=d)) %>% tidy()
}
simdata <- map(seq(50),~statfun(simfun())) %>% enframe() %>% unnest() %>% filter(term == "x")
下面判断哪些系数是"significant".
simdata <- simdata %>%
mutate(row_number(),
Significance = factor(p.value < 0.05))
如果你想使用基plot
,你可以这样做:
Significance = simdata$Significance
plot(simdata$estimate, col = ifelse(Significance==TRUE, "blue", "red"), ylab = "coeff")
lines(simdata$estimate)
或者使用 ggplot2
,您可以:
ggplot(simdata, aes(name, estimate)) + geom_line() + geom_point(aes(color = Significance), shape = 1) +
labs(x = "Index", y = "coeff") + theme_bw()
使用额外的向量来存储 p-values,然后根据它们的值与显着性水平 0.05 相比进行着色也解决了这个问题。具体来说:
library(plm)
coeff<-NULL
P_values<-NULL
for(e in 1:39){ #44 years total for each country
paneldata<-pdata.frame(
rbind(
subset(LaggedPannel,Country=="A")[(e):(e+5),],
subset(LaggedPannel,Country=="B")[(e):(e+5),]),
index=c("Country","Year")) #we made our new windowed panel frame
coef<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,1] #gets the coeff from a panel regression
PV<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,4] #stores the p-values
coeff<-c(coeff,coef)
P_values<-c(P_values,PV)
}
plot(coeff,type="b",col="red") #previousplot
plot(coeff,col=ifelse(P_values<=0.05, "blue", "red"),ylab = "coef",type="b")
#new plot based on significant values:
这个答案的唯一问题是,如果您要考虑多个变量,它会非常乏味;那么您将需要创建多个空向量等等。这不是一个快速的方法,但肯定有效。
这有点棘手!我是 运行 滚动 window 回归,我正在收集每个 window 的所有系数。我的目标是绘制系数如何随时间波动。此外,我希望通过在不显着时给出不同的颜色点来在发现系数具有统计显着性(比如 95%)时给出不同的颜色。
我目前拥有的是:
library(plm)
coeff<-NULL
for(e in 1:39){ #44 years total for each country
paneldata<-pdata.frame(
rbind(
subset(LaggedPannel,Country=="A")[(e):(e+5),],
subset(LaggedPannel,Country=="B")[(e):(e+5),]),
index=c("Country","Year")) #we made our new windowed panel frame
coef<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,1] #gets the coeff from a panel regression
coeff<-c(coeff,coef) #store coeffs
}
plot(coeff,type="b",col="red")
情节产生了:
例如,假设第二个和第四个系数(图中的项目符号)在统计上不显着;那么他们的颜色应该是绿色的。
Data (LaggedPannel)
:
Age1 Age2 Age3
Australia-1973 261.156 255.699 249.954
Australia-1974 261.305 255.394 251.470
Australia-1975 258.160 253.543 250.538
Australia-1976 262.504 258.066 254.720
Australia-1977 240.086 260.846 258.418
Australia-1978 228.774 238.871 259.449
USA-1973 4100.257 4104.028 4107.409
USA-1974 4135.435 4118.422 4120.286
USA-1975 4171.648 4164.065 4134.525
USA-1976 4208.236 4187.196 4171.167
USA-1977 4240.832 4211.655 4189.650
USA-1978 4286.923 4255.092 4229.701
这里是一些模拟数据。
library(tidyverse)
library(broom)
simfun <- function(a=0.1,B=0.05,n=200,x.sd=1,e.sd=1) {
x <- rnorm(n, mean=0, sd=x.sd) + runif(100)
e <- rnorm(n, mean=0, sd=e.sd)
y <- a+B*x+e
data.frame(x,y)
}
statfun <- function(d) {
summary(lm(y~x,data=d)) %>% tidy()
}
simdata <- map(seq(50),~statfun(simfun())) %>% enframe() %>% unnest() %>% filter(term == "x")
下面判断哪些系数是"significant".
simdata <- simdata %>%
mutate(row_number(),
Significance = factor(p.value < 0.05))
如果你想使用基plot
,你可以这样做:
Significance = simdata$Significance
plot(simdata$estimate, col = ifelse(Significance==TRUE, "blue", "red"), ylab = "coeff")
lines(simdata$estimate)
或者使用 ggplot2
,您可以:
ggplot(simdata, aes(name, estimate)) + geom_line() + geom_point(aes(color = Significance), shape = 1) +
labs(x = "Index", y = "coeff") + theme_bw()
使用额外的向量来存储 p-values,然后根据它们的值与显着性水平 0.05 相比进行着色也解决了这个问题。具体来说:
library(plm)
coeff<-NULL
P_values<-NULL
for(e in 1:39){ #44 years total for each country
paneldata<-pdata.frame(
rbind(
subset(LaggedPannel,Country=="A")[(e):(e+5),],
subset(LaggedPannel,Country=="B")[(e):(e+5),]),
index=c("Country","Year")) #we made our new windowed panel frame
coef<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,1] #gets the coeff from a panel regression
PV<-coef(summary(plm(Y~lag(Y,1),data=paneldata,model="pooling")))[2,4] #stores the p-values
coeff<-c(coeff,coef)
P_values<-c(P_values,PV)
}
plot(coeff,type="b",col="red") #previousplot
plot(coeff,col=ifelse(P_values<=0.05, "blue", "red"),ylab = "coef",type="b")
#new plot based on significant values:
这个答案的唯一问题是,如果您要考虑多个变量,它会非常乏味;那么您将需要创建多个空向量等等。这不是一个快速的方法,但肯定有效。