附加一行,其中包含所选列的平均值,并根据条件计算另一列的百分比
Append one row with average values for selected columns and counting percent for another based on conditions
我下面有一个df
,我需要在排除-
行和pred1
和pred2
的平均值后计算Pass
百分比:
df <- data.frame(
name = c('A', 'B', 'C', 'D', 'E'),
status = c('Pass', 'Fail', '-', 'Pass', 'Pass'),
real = c(10, NA, 8, 9, 4),
pred1 = c(50, 20, NA, 14, 11),
pred2 = c(12, 12, 8, NA, 6)
)
df:
name status real pred1 pred2
1 A Pass 10 50 12
2 B Fail NA 20 12
3 C - 8 NA 8
4 D Pass 9 14 NA
5 E Pass 4 11 6
预期结果:
name status real pred1 pred2
1 A Pass 10 50 12
2 B Fail NA 20 12
3 C - 8 NA 8
4 D Pass 9 14 NA
5 E Pass 4 11 6
6 total 0.75 NA 23.75 9.5
我想把下面的结果绑定到df
,但不是简洁美观的解决方案:
pass_percent <- nrow(df %>% filter(status == 'Pass')) / nrow(df %>% filter(status != '-'))
avg_pred1 <- mean(df$pred1, na.rm = T)
avg_pred2 <- mean(df$pred2, na.rm = T)
如何使用 R 的管道以一种更简洁的方式实现这一点?
那tibble::add_row
呢:
df %>%
add_row(name = "total",
status = as.character(mean(df$status[df$status != "-"] == "Pass")),
real = mean(df$real),
pred1 = mean(df$pred1, na.rm = T),
pred2 = mean(df$pred2, na.rm = T))
name status real pred1 pred2
1 A Pass 10 50.00 12.0
2 B Fail NA 20.00 12.0
3 C - 8 NA 8.0
4 D Pass 9 14.00 NA
5 E Pass 4 11.00 6.0
6 total 0.75 NA 23.75 9.5
as.character(mean(df$status[df$status != "-"] == "Pass"))
的解释:
df$status[df$status != "-"]
是 df$status
的向量,没有等于 "-"
的元素(所以只有 Pass
和 Fail
)。
如果 df$status
是 "Pass"
,df$status[df$status != "-"] == "Pass"
是 TRUE
,否则 FALSE
。
mean(...)
是可能的,因为在计算平均值时 TRUE 和 FALSE 值被强制转换为数字。
as.character(...)
是必需的,因为 df$status
是一个字符变量。
不如@Mael 简洁,但采取不同的方式:
df %>%
bind_rows(
data.frame(name = "total") %>%
bind_cols(df %>%
summarise(across(matches("pred"), list(~mean(.x, na.rm = TRUE)), .names = "{.col}"),
status = as.character(mean(status[status != "-"] == "Pass")))))
# name status real pred1 pred2
# 1 A Pass 10 50.00 12.0
# 2 B Fail NA 20.00 12.0
# 3 C - 8 NA 8.0
# 4 D Pass 9 14.00 NA
# 5 E Pass 4 11.00 6.0
# 6 total 0.75 NA 23.75 9.5
我下面有一个df
,我需要在排除-
行和pred1
和pred2
的平均值后计算Pass
百分比:
df <- data.frame(
name = c('A', 'B', 'C', 'D', 'E'),
status = c('Pass', 'Fail', '-', 'Pass', 'Pass'),
real = c(10, NA, 8, 9, 4),
pred1 = c(50, 20, NA, 14, 11),
pred2 = c(12, 12, 8, NA, 6)
)
df:
name status real pred1 pred2
1 A Pass 10 50 12
2 B Fail NA 20 12
3 C - 8 NA 8
4 D Pass 9 14 NA
5 E Pass 4 11 6
预期结果:
name status real pred1 pred2
1 A Pass 10 50 12
2 B Fail NA 20 12
3 C - 8 NA 8
4 D Pass 9 14 NA
5 E Pass 4 11 6
6 total 0.75 NA 23.75 9.5
我想把下面的结果绑定到df
,但不是简洁美观的解决方案:
pass_percent <- nrow(df %>% filter(status == 'Pass')) / nrow(df %>% filter(status != '-'))
avg_pred1 <- mean(df$pred1, na.rm = T)
avg_pred2 <- mean(df$pred2, na.rm = T)
如何使用 R 的管道以一种更简洁的方式实现这一点?
那tibble::add_row
呢:
df %>%
add_row(name = "total",
status = as.character(mean(df$status[df$status != "-"] == "Pass")),
real = mean(df$real),
pred1 = mean(df$pred1, na.rm = T),
pred2 = mean(df$pred2, na.rm = T))
name status real pred1 pred2
1 A Pass 10 50.00 12.0
2 B Fail NA 20.00 12.0
3 C - 8 NA 8.0
4 D Pass 9 14.00 NA
5 E Pass 4 11.00 6.0
6 total 0.75 NA 23.75 9.5
as.character(mean(df$status[df$status != "-"] == "Pass"))
的解释:
df$status[df$status != "-"]
是df$status
的向量,没有等于"-"
的元素(所以只有Pass
和Fail
)。
如果 df$status[df$status != "-"] == "Pass"
是TRUE
,否则FALSE
。mean(...)
是可能的,因为在计算平均值时 TRUE 和 FALSE 值被强制转换为数字。as.character(...)
是必需的,因为df$status
是一个字符变量。
df$status
是 "Pass"
,不如@Mael 简洁,但采取不同的方式:
df %>%
bind_rows(
data.frame(name = "total") %>%
bind_cols(df %>%
summarise(across(matches("pred"), list(~mean(.x, na.rm = TRUE)), .names = "{.col}"),
status = as.character(mean(status[status != "-"] == "Pass")))))
# name status real pred1 pred2
# 1 A Pass 10 50.00 12.0
# 2 B Fail NA 20.00 12.0
# 3 C - 8 NA 8.0
# 4 D Pass 9 14.00 NA
# 5 E Pass 4 11.00 6.0
# 6 total 0.75 NA 23.75 9.5