对几年汇总数据的逻辑回归
Logistic regression on pooled data on several years
我有一个包含 N 家公司的数据库,以及每年对每个公司的几个变量(数字和二进制)的观察。
举个例子:
df <- data.frame(
"Year" = c (2010,2010,2010,2011,2011,2011,2012,2012,2012)
"Firm" = c ("A","B","C","A","B","C","A","B","C")
"Holding" = c (TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE)
"Male CEO" = c (TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE))
我想检查时间 T 中的解释变量(例如 Revenue 2010、Holding 2010)与 T+1 中的因变量(例如 Male CEO 2011)之间的关联,当然不仅是两年,而且多年来我的样品。我打算用逻辑回归来做。
我的问题不是使用哪个模型,而是如何告诉我的模型在因变量之一之后的一年取因变量,一次取一年。我现在建模的方式需要所有的观察结果,无论年份如何。
你对如何解决这个问题有什么建议吗?
谢谢
编辑:
为清楚起见,直接从我的 R 文件中复制,这就是我现在设置模型的方式。
CSR_Contracting <- lag(df$Policy_Executive_Compensation_ESG_Performance)
mod1 <- glm(df$CSR_Contracting ~
+ df$ESG_Score
+ df$Environmental_Pillar_Score
+ df$Social_Pillar_Score
+ df$Board_Cultural_Diversity_Percent_Score
+ df$Board_Gender_Diversity_Percent_Score
+ df$Policy_Board_Diversity
+ df$CSR_Sustainability_External_Audit
+ df$ROA
+ df$Size
+ df$PTB_Ratio
+ df$Leverage
+ df$CSR_Sustainability_Committee
+ df$Independent_Board_Members
+ df$CEO_Chairman_Separation
+ df$Chairman_is_ex_CEO
+ df$Policy_Equal_Voting_Right
+ df$Shareholders_Approval_Stock_Comp_Plan
+ df$Policy_Executive_Retention
+ df$Compensation_Improvement_Tools
+ df$Executive_Compensation_Policy
+ df$CEO_Compensation_Link_to_TSR
+ df$Executive_Compensation_LT_Objectives
+ df$Shareholders_Vote_on_Executive_Pay
+ df$Veto_Power_or_Golden_share
+ df$SOE,
family = binomial, maxit=100)
我不确定这是否是你的意思,但根据你的数据样本,可以按如下方式完成。
library(tidyverse)
df = tibble(
Year = c (2010,2010,2010,2011,2011,2011,2012,2012,2012),
Firm = c ("A","B","C","A","B","C","A","B","C"),
Holding = c (TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE),
Male.CEO = c (TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)) %>%
mutate(Male.CEO.1 = lead(Male.CEO,3)) %>%
filter(Year != 2012)
df
# # A tibble: 6 x 5
# Year Firm Holding Male.CEO Male.CEO.1
# <dbl> <chr> <lgl> <lgl> <lgl>
# 1 2010 A TRUE TRUE TRUE
# 2 2010 B FALSE TRUE FALSE
# 3 2010 C FALSE TRUE TRUE
# 4 2011 A FALSE TRUE TRUE
# 5 2011 B TRUE FALSE FALSE
# 6 2011 C FALSE TRUE FALSE
glm(Male.CEO.1~Firm+ Holding, data=df)
#
# Call: glm(formula = Male.CEO.1 ~ Firm + Holding, data = df)
#
# Coefficients:
# (Intercept) FirmB FirmC HoldingTRUE
# 1.000e+00 -1.000e+00 -5.000e-01 1.249e-16
#
# Degrees of Freedom: 5 Total (i.e. Null); 2 Residual
# Null Deviance: 1.5
# Residual Deviance: 0.5 AIC: 12.12
请注意,为了在 T+1 时获得 Male.CEO.1
,我使用了 lead (Male.CEO, 3)
。但是,如果每年的观察次数完全相同(在本例中为 3),这只会产生意义。
我有一个包含 N 家公司的数据库,以及每年对每个公司的几个变量(数字和二进制)的观察。
举个例子:
df <- data.frame(
"Year" = c (2010,2010,2010,2011,2011,2011,2012,2012,2012)
"Firm" = c ("A","B","C","A","B","C","A","B","C")
"Holding" = c (TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE)
"Male CEO" = c (TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE))
我想检查时间 T 中的解释变量(例如 Revenue 2010、Holding 2010)与 T+1 中的因变量(例如 Male CEO 2011)之间的关联,当然不仅是两年,而且多年来我的样品。我打算用逻辑回归来做。
我的问题不是使用哪个模型,而是如何告诉我的模型在因变量之一之后的一年取因变量,一次取一年。我现在建模的方式需要所有的观察结果,无论年份如何。
你对如何解决这个问题有什么建议吗?
谢谢
编辑:
为清楚起见,直接从我的 R 文件中复制,这就是我现在设置模型的方式。
CSR_Contracting <- lag(df$Policy_Executive_Compensation_ESG_Performance)
mod1 <- glm(df$CSR_Contracting ~
+ df$ESG_Score
+ df$Environmental_Pillar_Score
+ df$Social_Pillar_Score
+ df$Board_Cultural_Diversity_Percent_Score
+ df$Board_Gender_Diversity_Percent_Score
+ df$Policy_Board_Diversity
+ df$CSR_Sustainability_External_Audit
+ df$ROA
+ df$Size
+ df$PTB_Ratio
+ df$Leverage
+ df$CSR_Sustainability_Committee
+ df$Independent_Board_Members
+ df$CEO_Chairman_Separation
+ df$Chairman_is_ex_CEO
+ df$Policy_Equal_Voting_Right
+ df$Shareholders_Approval_Stock_Comp_Plan
+ df$Policy_Executive_Retention
+ df$Compensation_Improvement_Tools
+ df$Executive_Compensation_Policy
+ df$CEO_Compensation_Link_to_TSR
+ df$Executive_Compensation_LT_Objectives
+ df$Shareholders_Vote_on_Executive_Pay
+ df$Veto_Power_or_Golden_share
+ df$SOE,
family = binomial, maxit=100)
我不确定这是否是你的意思,但根据你的数据样本,可以按如下方式完成。
library(tidyverse)
df = tibble(
Year = c (2010,2010,2010,2011,2011,2011,2012,2012,2012),
Firm = c ("A","B","C","A","B","C","A","B","C"),
Holding = c (TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE),
Male.CEO = c (TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)) %>%
mutate(Male.CEO.1 = lead(Male.CEO,3)) %>%
filter(Year != 2012)
df
# # A tibble: 6 x 5
# Year Firm Holding Male.CEO Male.CEO.1
# <dbl> <chr> <lgl> <lgl> <lgl>
# 1 2010 A TRUE TRUE TRUE
# 2 2010 B FALSE TRUE FALSE
# 3 2010 C FALSE TRUE TRUE
# 4 2011 A FALSE TRUE TRUE
# 5 2011 B TRUE FALSE FALSE
# 6 2011 C FALSE TRUE FALSE
glm(Male.CEO.1~Firm+ Holding, data=df)
#
# Call: glm(formula = Male.CEO.1 ~ Firm + Holding, data = df)
#
# Coefficients:
# (Intercept) FirmB FirmC HoldingTRUE
# 1.000e+00 -1.000e+00 -5.000e-01 1.249e-16
#
# Degrees of Freedom: 5 Total (i.e. Null); 2 Residual
# Null Deviance: 1.5
# Residual Deviance: 0.5 AIC: 12.12
请注意,为了在 T+1 时获得 Male.CEO.1
,我使用了 lead (Male.CEO, 3)
。但是,如果每年的观察次数完全相同(在本例中为 3),这只会产生意义。