ClusterBootstrap::clusbootglm() 花了很长时间运行

ClusterBootstrap::clusbootglm() taking long time to run

我正在使用 ClusterBootstrap 包中的 clusbootglm() 函数。运行这花费了异常长的时间。数据框仅包含 900 行和 4 列。

clusfunc <- function(df1) {
  mod1 <- clusbootglm(y ~ treat + u, data = 
df1, clusterid = group, family = gaussian, B = 900)
  coef(mod1)[[2]]
}

betasclustered <- replicate(1000, clusfunc(df1))

这里是这个函数的 the documentation。

运行函数的一次迭代大约需要一秒钟。但是，运行ning 1000 花费的时间超过 1000 秒。你有什么建议吗？我应该自己编写一个不同的函数而不是使用 clusbootglm() 函数吗？

我可以使用以下函数，而不是使用 clusbootglm()。我已经测试过了，这个迭代1000次只需要几秒钟。我仍然不清楚为什么 clusbootglm() 花了这么长时间才达到运行（超过 45 分钟），但这是一个很好的选择。

getclusteredsamplecoef <- function(df1) {
  sample <- df1 %>% 
  group_by(group) %>% 
  nest(df1 = -group) %>%  
  ungroup() %>% 
  sample_n(180, replace = TRUE) %>% 
  unnest(df1)
  model <- lm(y ~ treat + u, sample)
  return(model$coefficients[[2]])
}

值得注意的是，这与clusbootglm() 并不完全相同，因为我运行是线性模型而不是广义线性模型。这可以通过在函数中使用 glm() 或 lm_robust() 代替 lm() 来更改。

设置 n=180 产生 180 个组。在我的样本中，每个组中有 5 个人，所以这会产生 900 个观察值。如果你想获得一定数量的观察值，将这个数字除以每组内的数字，并将结果用作 sample_n() 的输入。