使用将方程式应用于参考的函数填充数据框列 table

Question

假设您想使用 Z 分数计算患者的 BMI 与人口中位数的差距。这是使用患者的 BMI 加上三个年龄和性别相关变量计算得出的。在 table.

中查找这三个变量

因此，我创建了一个将年龄、性别和 BMI 作为输入的函数。它使用性别来找到合适的 table（男性或女性），使用年龄在 table 中找到合适的行，然后在计算中使用 BMI，该计算包含您刚才的年龄和性别特定变量抬头。当我手动将数据输入函数时，我的函数可以工作，但我想不通的是如何迭代遍历数据框中的每一行以应用我的函数，使用数据框中其他行特定的列作为输入。

为了简单起见，我只使用下面两个年龄和性别相关的变量（中位数 BMI，然后是乘数）

### make master dataframe
study_id <- c(1001, 1002, 1003, 1004, 1005)
age <- c(4, 3, 3, 1, 5)
sex <- c(1, 1, 2, 2, 1)
df <- tibble(study_id, age_df, sex)

### reference male table
age_m <- c(1, 2, 3, 4, 5)
median_bmi_m <- c(14.9, 16.3, 16.9, 17.2, 17.3)
multiplier_m <- c(22, 23, 43, 11, 33)
reference_male <- tibble(age_m, median_bmi_m, multiplier_m)

### reference female table
age_f <- c(1, 2, 3, 4, 5)
median_bmi_f <- c(15.9, 17.3, 17.9, 18.2, 18.3)
multiplier_f <- c(12, 13, 33, 21, 23)
reference_female <- tibble(age_f, median_bmi_f, multiplier_f)

### my function
toy_function <- function(age, sex) {
  if(sex == 1) {
    a <- reference_male[age, 2]
    b <- reference_male[age, 3]
    c <- a*b
  } else {
    a <- reference_female[age, 2]
    b <- reference_female[age, 3]
    c <- a*b
  }
  return(as.numeric(c))
}

函数 returns 一个数值“c”，我想将其逐行应用于每个患者。我构建了一个 FOR 循环来执行此操作，但我认为使用 purrr 或 apply() 函数有更优雅的方法。我尝试简单地将函数粘贴到 mutate 中，但出现错误。

df <- df %>%
   mutate(new column = toy_function(age, sex)

Error in toy_function(age_df, sex) : 
  'list' object cannot be coerced to type 'double'
In addition: Warning message:
In if (sex == 1) { :
  the condition has length > 1 and only the first element will be used

感谢您的帮助。我仍然没有很好地掌握 purrr 和其他逐行迭代策略。

更新

感谢大家的回答。虽然为原始玩具示例提供的解决方案有效，但当我回到原来的更复杂的函数（使用三个输入而不是两个）时，我收到一条错误消息。

假设我们更新函数和原始数据框以合并 BMI：

### updated dataframe with BMI variable
study_id <- c(1001, 1002, 1003, 1004, 1005)
age <- c(4, 3, 3, 1, 5)
sex <- c(1, 1, 2, 2, 1)
bmi <- c(15, 16, 17, 18, 19)
df <- tibble(study_id, age_df, sex, bmi)

### updated function with bmi variable incorporated into the equation

toy_function <- function(age, sex, bmi) {
  if(sex == 1) {
    a <- reference_male[age, 2]
    b <- reference_male[age, 3]
    c <- a*b*bmi
  } else {
    a <- reference_female[age, 2]
    b <- reference_female[age, 3]
    c <- a*b*bmi
  }
  return(as.numeric(c))
}

当我运行这样的解决方案代码时，出现以下错误：

df %>%
  mutate(new_column = map2_dbl(age, sex, bmi, ~ toy_function(..1, ..2, ..3)))

Result 1 must be a single double, not NULL of length 0

似乎我在添加第三个变量时做错了什么。注意：我读到当函数中有多个变量时，..1、..2、..3 语法可能是首选，但我可能弄错了。

Answer 1

我们必须在 mutate 之前使用 rowwise 函数：

library(dplyr)

df %>%
  rowwise() %>%
  mutate(new_column = toy_function(age, sex))

# A tibble: 5 x 4
# Rowwise: 
  study_id   age   sex new_column
     <dbl> <dbl> <dbl>      <dbl>
1     1001     4     1       189.
2     1002     3     1       727.
3     1003     3     2       591.
4     1004     1     2       191.
5     1005     5     1       571.

或者，如果您想使用 purrr 进行操作，您可以使用以下代码。这里因为是逐行操作，所以.x的值指的是每一行中变量age对应的值，.y指的是每一行中变量sex对应的值行：

library(purrr)

df %>%
  mutate(new_column = map2_dbl(age, sex, ~ toy_function(.x, .y)))

# A tibble: 5 x 4
  study_id   age   sex new_column
     <dbl> <dbl> <dbl>      <dbl>
1     1001     4     1       189.
2     1002     3     1       727.
3     1003     3     2       591.
4     1004     1     2       191.
5     1005     5     1       571.

或以 R 为基数：

cbind(mapply(\(x, y) toy_function(x, y), df$age, df$sex) |>
        as.data.frame() |>
        setNames("new_column"), df)

更新解决方案 应该注意的是，由于我们在这里迭代了 2 个以上的变量，因此我们需要使用 pmap 而不是 map2。

df %>%
  mutate(new_column = pmap_dbl(., ~ toy_function(..2, ..3, ..4)))

# A tibble: 5 x 5
  study_id   age   sex   bmi new_column
     <dbl> <dbl> <dbl> <dbl>      <dbl>
1     1001     4     1    15      2838 
2     1002     3     1    16     11627.
3     1003     3     2    17     10042.
4     1004     1     2    18      3434.
5     1005     5     1    19     10847.

或者，如果您想坚持自己的解决方案，只需排除 pmap 的 .l 参数中的第一个变量：

df %>%
  mutate(new_column = pmap_dbl(df[-1], ~ toy_function(..1, ..2, ..3)))

并且使用 pmap 我们不需要 rowwise 强调文档中指定的逐行操作：

Note that a data frame is a very important special case, in which case pmap() and pwalk() apply the function .f to each row.

Answer 2

由于函数是用未矢量化的 if/else 构造的，我们可以将函数转换为 Vectorized 并应用

library(dplyr)
df %>%
     mutate(new_column = Vectorize(toy_function)(age, sex))

-输出

# A tibble: 5 x 4
  study_id   age   sex new_column
     <dbl> <dbl> <dbl>      <dbl>
1     1001     4     1       189.
2     1002     3     1       727.
3     1003     3     2       591.
4     1004     1     2       191.
5     1005     5     1       571.

使用将方程式应用于参考的函数填充数据框列 table

Populate a dataframe column using a function that applies an equation to a reference table

r

dataframe

dplyr

purrr