R:计算每个模型的行数,不包括某些变量(如果存在)
R: Count number of rows per model excluding certain variables if present
我有一个 table 看起来像这样:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2))
我想计算每个模型中除了截距、月份、额定差异之外的变量数量。我想要的输出是:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2), variables = c(3,3,3,3,3,3,3,3,3,3,3,3))
我尝试使用以下方式获取标志:
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c(grep("month", x), "RateDiff")), na.rm = T))
但是 grep(month)
不起作用。
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c("month", "RateDiff")), na.rm = T))
这有效,但未捕获后跟后缀的月份。
我想要在变量 intercept、month 和 RateDiff 上与来自 sql 的 ~ilike~ 等价的东西,因为我不希望它区分大小写并且希望允许变量的后缀和前缀。我怎样才能做到这一点?
这是 dplyr
-
的一种方式
modelsummary %>%
mutate(
variables = term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
n_distinct()
)
term mod_id variables
1 (Intercept) 1 3
2 month1 1 3
3 month2 1 3
4 RateDiff 1 3
5 var1 1 3
6 var2 1 3
7 var3 1 3
8 (Intercept) 2 3
9 month1 2 3
10 var1 2 3
11 var2 2 3
12 var3 2 3
或者 dplyr
和 stringr
:
modelsummary %>%
mutate(
variables = str_subset(tolower(term), "intercept|month|ratediff", TRUE) %>%
n_distinct()
)
如果要计算每个mod_id
的变量数,请在mutate
之前添加group_by(mod_id)
。
在基础 R 中 -
modelsummary$variables <- with(modelsummary,
term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
unique() %>% length()
)
我有一个 table 看起来像这样:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2))
我想计算每个模型中除了截距、月份、额定差异之外的变量数量。我想要的输出是:
modelsummary <- data.frame(term = c("(Intercept)", "month1", "month2", "RateDiff", "var1", "var2", "var3", "(Intercept)", "month1", "var1", "var2", "var3"), mod_id = c(1,1,1,1,1,1,1,2,2,2,2,2), variables = c(3,3,3,3,3,3,3,3,3,3,3,3))
我尝试使用以下方式获取标志:
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c(grep("month", x), "RateDiff")), na.rm = T))
但是 grep(month)
不起作用。
modelsummary$dim <- apply(modelsummary[, "term"], MARGIN = 1,
function(x) sum(!(x %in% c("month", "RateDiff")), na.rm = T))
这有效,但未捕获后跟后缀的月份。
我想要在变量 intercept、month 和 RateDiff 上与来自 sql 的 ~ilike~ 等价的东西,因为我不希望它区分大小写并且希望允许变量的后缀和前缀。我怎样才能做到这一点?
这是 dplyr
-
modelsummary %>%
mutate(
variables = term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
n_distinct()
)
term mod_id variables
1 (Intercept) 1 3
2 month1 1 3
3 month2 1 3
4 RateDiff 1 3
5 var1 1 3
6 var2 1 3
7 var3 1 3
8 (Intercept) 2 3
9 month1 2 3
10 var1 2 3
11 var2 2 3
12 var3 2 3
或者 dplyr
和 stringr
:
modelsummary %>%
mutate(
variables = str_subset(tolower(term), "intercept|month|ratediff", TRUE) %>%
n_distinct()
)
如果要计算每个mod_id
的变量数,请在mutate
之前添加group_by(mod_id)
。
在基础 R 中 -
modelsummary$variables <- with(modelsummary,
term[!grepl(pattern = "intercept|month|ratediff", tolower(term))] %>%
unique() %>% length()
)