对几年汇总数据的逻辑回归

Logistic regression on pooled data on several years

我有一个包含 N 家公司的数据库,以及每年对每个公司的几个变量(数字和二进制)的观察。

举个例子:

df <- data.frame(
"Year" = c (2010,2010,2010,2011,2011,2011,2012,2012,2012)
"Firm" = c ("A","B","C","A","B","C","A","B","C")
"Holding" = c (TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE)
"Male CEO" = c (TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE))

我想检查时间 T 中的解释变量(例如 Revenue 2010、Holding 2010)与 T+1 中的因变量(例如 Male CEO 2011)之间的关联,当然不仅是两年,而且多年来我的样品。我打算用逻辑回归来做。

我的问题不是使用哪个模型,而是如何告诉我的模型在因变量之一之后的一年取因变量,一次取一年。我现在建模的方式需要所有的观察结果,无论年份如何。

你对如何解决这个问题有什么建议吗?

谢谢

编辑:

为清楚起见,直接从我的 R 文件中复制,这就是我现在设置模型的方式。

CSR_Contracting <- lag(df$Policy_Executive_Compensation_ESG_Performance)

mod1 <- glm(df$CSR_Contracting ~
        + df$ESG_Score
        + df$Environmental_Pillar_Score
        + df$Social_Pillar_Score
        + df$Board_Cultural_Diversity_Percent_Score
        + df$Board_Gender_Diversity_Percent_Score
        + df$Policy_Board_Diversity
        + df$CSR_Sustainability_External_Audit
        + df$ROA
        + df$Size
        + df$PTB_Ratio
        + df$Leverage
        + df$CSR_Sustainability_Committee
        + df$Independent_Board_Members
        + df$CEO_Chairman_Separation
        + df$Chairman_is_ex_CEO
        + df$Policy_Equal_Voting_Right
        + df$Shareholders_Approval_Stock_Comp_Plan
        + df$Policy_Executive_Retention
        + df$Compensation_Improvement_Tools
        + df$Executive_Compensation_Policy
        + df$CEO_Compensation_Link_to_TSR
        + df$Executive_Compensation_LT_Objectives
        + df$Shareholders_Vote_on_Executive_Pay
        + df$Veto_Power_or_Golden_share
        + df$SOE,
         family = binomial, maxit=100)   

我不确定这是否是你的意思,但根据你的数据样本,可以按如下方式完成。

library(tidyverse)
df = tibble(
  Year = c (2010,2010,2010,2011,2011,2011,2012,2012,2012),
  Firm = c ("A","B","C","A","B","C","A","B","C"),
  Holding = c (TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,TRUE,TRUE,FALSE),
  Male.CEO = c (TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE)) %>% 
  mutate(Male.CEO.1 = lead(Male.CEO,3)) %>% 
  filter(Year != 2012)

df
# # A tibble: 6 x 5
# Year Firm  Holding Male.CEO Male.CEO.1
# <dbl> <chr> <lgl>   <lgl>    <lgl>     
# 1  2010 A     TRUE    TRUE     TRUE      
# 2  2010 B     FALSE   TRUE     FALSE     
# 3  2010 C     FALSE   TRUE     TRUE      
# 4  2011 A     FALSE   TRUE     TRUE      
# 5  2011 B     TRUE    FALSE    FALSE     
# 6  2011 C     FALSE   TRUE     FALSE  

glm(Male.CEO.1~Firm+ Holding, data=df)
# 
# Call:  glm(formula = Male.CEO.1 ~ Firm + Holding, data = df)
# 
# Coefficients:
#   (Intercept)        FirmB        FirmC  HoldingTRUE  
# 1.000e+00   -1.000e+00   -5.000e-01    1.249e-16  
# 
# Degrees of Freedom: 5 Total (i.e. Null);  2 Residual
# Null Deviance:        1.5 
# Residual Deviance: 0.5    AIC: 12.12

请注意,为了在 T+1 时获得 Male.CEO.1,我使用了 lead (Male.CEO, 3)。但是,如果每年的观察次数完全相同(在本例中为 3),这只会产生意义。