附加一行,其中包含所选列的平均值,并根据条件计算另一列的百分比

Append one row with average values for selected columns and counting percent for another based on conditions

我下面有一个df,我需要在排除-行和pred1pred2的平均值后计算Pass百分比:

df <- data.frame(
  name = c('A', 'B', 'C', 'D', 'E'), 
  status = c('Pass', 'Fail', '-', 'Pass', 'Pass'), 
  real = c(10, NA, 8, 9, 4), 
  pred1 = c(50, 20, NA, 14, 11),
  pred2 = c(12, 12, 8, NA, 6)
)

df:

  name status real pred1 pred2
1    A   Pass   10    50    12
2    B   Fail   NA    20    12
3    C      -    8    NA     8
4    D   Pass    9    14    NA
5    E   Pass    4    11     6

预期结果:

  name status real pred1 pred2
1    A   Pass   10    50    12
2    B   Fail   NA    20    12
3    C      -    8    NA     8
4    D   Pass    9    14    NA
5    E   Pass    4    11     6
6 total  0.75   NA 23.75   9.5

我想把下面的结果绑定到df,但不是简洁美观的解决方案:

pass_percent <- nrow(df %>% filter(status == 'Pass')) / nrow(df %>% filter(status != '-'))
avg_pred1 <- mean(df$pred1, na.rm = T)
avg_pred2 <- mean(df$pred2, na.rm = T)

如何使用 R 的管道以一种更简洁的方式实现这一点?

tibble::add_row呢:

df %>% 
  add_row(name = "total",
          status = as.character(mean(df$status[df$status != "-"] == "Pass")),
          real = mean(df$real),
          pred1 = mean(df$pred1, na.rm = T),
          pred2 = mean(df$pred2, na.rm = T))

   name status real pred1 pred2
1     A   Pass   10 50.00  12.0
2     B   Fail   NA 20.00  12.0
3     C      -    8    NA   8.0
4     D   Pass    9 14.00    NA
5     E   Pass    4 11.00   6.0
6 total   0.75   NA 23.75   9.5

as.character(mean(df$status[df$status != "-"] == "Pass"))的解释:

  • df$status[df$status != "-"]df$status 的向量,没有等于 "-" 的元素(所以只有 PassFail)。
  • 如果 df$status"Pass"
  • df$status[df$status != "-"] == "Pass"TRUE,否则 FALSE
  • mean(...) 是可能的,因为在计算平均值时 TRUE 和 FALSE 值被强制转换为数字。
  • as.character(...) 是必需的,因为 df$status 是一个字符变量。

不如@Mael 简洁,但采取不同的方式:

df %>% 
  bind_rows(
    data.frame(name = "total") %>% 
      bind_cols(df %>% 
         summarise(across(matches("pred"), list(~mean(.x, na.rm = TRUE)), .names = "{.col}"),
            status = as.character(mean(status[status != "-"] == "Pass")))))
#    name status real pred1 pred2
# 1     A   Pass   10 50.00  12.0
# 2     B   Fail   NA 20.00  12.0
# 3     C      -    8    NA   8.0
# 4     D   Pass    9 14.00    NA
# 5     E   Pass    4 11.00   6.0
# 6 total   0.75   NA 23.75   9.5