是否有类似于 Stata 中的 foreach 循环的 R 函数,用于根据现有变量的名称(或根)创建新变量?

Is there an R function similar to foreach loops in Stata for creating new variables based on the name (or root) of existing variables?

我有一个包含 60 个变量(基本上是 30 对)的列表,我需要结合所有对的信息以根据每对中存储的数据创建新变量。

为了提供一些背景信息,我正在对预测模型研究进行系统审查,并且我提取了关于哪些变量被考虑以包含在每项研究的预测模型中的数据(前 30 个变量)以及哪些变量包含在模型中(后 30 个变量)

所有变量都是二进制的。

前30个变量写成“p_[varname]”的形式 第二个 30 以“p_[varname]_inc”的形式书写。 我想创建一个名为 [varname] 的新变量,并采用值“未考虑”、“已考虑”和“已包含”。

在 Stata 中,我可以像这样轻松地做到这一点:



foreach v of [varname1]-[varname30] {
gen `v' = "Not considered" if p_`v' == 0
replace `v' = "Considered" if p_`v' == 1 & p_`v'_inc == 0
replace `v' = "Included" if p_`v'_inc == 1 & p_`v'_inc == 1
}

在 R 中,我能想到的唯一方法是为所有变量复制并粘贴相同的 ifelse 语句,例如:

predictor_vars %>% 
  mutate(age = ifelse(p_age==1 & p_age_inc==1, "Included", 
                      ifelse(p_age==1 & p_age_inc==0, "Considered", "Not considered")),
         sex = ifelse(p_sex==1 & p_sex_inc==1, "Included", 
                      ifelse(p_sex==1 & p_sex_inc==0, "Considered", "Not considered")), 
....
         [varname] = ifelse([varname]==1 & [varname]_inc==1, "Included", 
                      ifelse([varname]==1 & [varname]==0, "Considered", "Not considered"))
)

在 R / dplyr 中有更简单的方法吗?

编辑:很抱歉之前没有提供足够的细节(这里是新的,但非常感谢快速回复!)。这是数据示例

structure(list(p_age = structure(c(1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0), label = "Age", class = c("labelled", 
"numeric")), p_age_inc = structure(c(1, 0, 0, 1, 1, 1, 1, 1, 
1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 
0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0
), label = "Age", class = c("labelled", "numeric")), p_sex = structure(c(1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 
1, 1, 1, 0, 1, 1, 0), label = "Sex", class = c("labelled", "numeric"
)), p_sex_inc = structure(c(1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 
0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0), label = "Sex", class = c("labelled", 
"numeric")), p_nation = structure(c(0, 0, 0, 0, 1, 1, 0, 1, 0, 
1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0), label = "Nationality / country", class = c("labelled", 
"numeric")), p_nation_inc = structure(c(0, 0, 0, 0, 0, 0, 0, 
1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 
0), label = "Nationality / country", class = c("labelled", "numeric"
)), p_prevtb = structure(c(0, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 1, 
0, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0), label = "Treatment regimen / treatment status (retreatment)", class = c("labelled", 
"numeric")), p_prevtb_inc = structure(c(0, 0, 0, 0, 0, 0, 0, 
0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 
0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0), label = "Previous TB / retreated TB", class = c("labelled", 
"numeric"))), row.names = c(NA, 50L), class = "data.frame")

前 5 行(包含 4 组选定的预测变量)如下所示:

  p_age p_age_inc p_sex p_sex_inc p_nation p_nation_inc p_prevtb
1     1         1     1         1        0            0        0
2     1         0     1         0        0            0        0
3     1         0     1         1        0            0        0
4     1         1     1         1        0            0        0
5     1         1     1         0        1            0        1
6     1         1     1         0        1            0        1
  p_prevtb_inc
1            0
2            0
3            0
4            0
5            0
6            0

我想像这样创建新变量:

  p_age p_age_inc p_sex p_sex_inc p_nation p_nation_inc p_prevtb
1     1         1     1         1        0            0        0
2     1         0     1         0        0            0        0
3     1         0     1         1        0            0        0
4     1         1     1         1        0            0        0
5     1         1     1         0        1            0        1
6     1         1     1         0        1            0        1
  p_prevtb_inc        age        sex         nation         prevtb
1            0   Included   Included Not considered Not considered
2            0 Considered Considered Not considered Not considered
3            0 Considered   Included Not considered Not considered
4            0   Included   Included Not considered Not considered
5            0   Included Considered     Considered     Considered
6            0   Included Considered     Considered     Considered

这个解决方案可以改进,但它有效。该函数按照问题要求在 p_* 变量上的标准 for 循环中创建变量。然后 returns 结果。

参数 Bind 可以用于 return 通过设置 Bind = FALSE 新创建的变量。

create_var <- function(X, Bind = TRUE){
  xnames <- names(X)
  p_only <- grep('p_([^_]+$)', xnames, value = TRUE)
  res <- vector('list', length = length(p_only))
  for(i in seq_along(p_only)){
    x <- X[[ p_only[i] ]]
    y <- X[[paste0(p_only[i], '_inc')]]
    res[[i]] <- case_when(
      as.logical(x) & as.logical(y) ~ "Included",
      as.logical(x) & !as.logical(y) ~ "Considered",
      !as.logical(x) ~ "Not considered",
      TRUE ~ "Not considered"
    )
  }
  names(res) <- sub('^p_', '', p_only)
  res <- do.call(cbind.data.frame, res)
  if(Bind) cbind(X, res) else res
}

create_var(df1)
df1 %>% create_var()
df1 %>% create_var(Bind = FALSE)