如何在 tibble 的多行（按行）上改变 NA

Question

我花了一些时间尝试弄清楚如何在 tibble 中的行透视图上改变多行上的 NA 值，tibble 有 3 个观察值和 6 个变量，生成如下：

df <- data.frame(ID = c(1, 2, 3),
                 Score1 = c(90, 80, 70),
                 Score2 = c(66, 78, 86),
                 Score3 = c(NA, 86, 96),
                 Score4 = c(84, 76, 72),
                 Score5 = c(92, NA, 74))
sample_tibble <- as_tibble(df)

tibble 看起来像

# A tibble: 3 x 6
     ID Score1 Score2 Score3 Score4 Score5
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     1     90     66     NA     84     92
2     2     80     78     86     76     NA
3     3     70     86     96     72     74

我必须使用 tidyverse 中的函数（例如 mutate、mutate_at、rowwise.. 等），目标是替换 NA 在第 1 行（在 Score3 列）和第 2 行（在 Score5 列）分别与第 1 行和第 2 行的 mean （mean 用其他值计算在行而不是 NA) 上，所以理想的结果应该在 mutate

之后

# A tibble: 3 x 6
     ID Score1 Score2 Score3 Score4 Score5
  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1     1     90     66     83     84     92
2     2     80     78     86     76     80
3     3     70     86     96     72     74

第一个NA替换为mean(c(90, 66, NA, 84, 92), na.rm = TRUE)为83
第二个 NA 替换为 mean(c(80, 78, 86, 76, NA), na.rm = TRUE) 为 80

尝试了下面的一些代码，并且还检查了以前的文档 Apply a function to every row of a matrix or a data frame or dplyr - using mutate() like rowmeans()，但是代码从来没有工作，因为我能够弄清楚 mutate 函数的主体

sample_tibble[, -1] %>% rowwise() %>% mutate(...)

不限在rowwise或mutate上（比如mutate_at也不错），有没有什么办法可以改变第1行和第 2 行以达到目标格式（它 同时变异很好 ，而不是使用 for loop 变异两次），感谢任何解决方案！

Answer 1

一个稍微低效的方法是 gather 和 group_by 它：

sample_tibble %>%
  tidyr::gather(k, v, -ID) %>%
  group_by(ID) %>%
  mutate(v = if_else(is.na(v), mean(v, na.rm = TRUE), v)) %>%
  ungroup() %>%
  tidyr::spread(k, v)
# # A tibble: 3 x 6
#      ID Score1 Score2 Score3 Score4 Score5
#   <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
# 1     1     90     66     83     84     92
# 2     2     80     78     86     76     80
# 3     3     70     86     96     72     74

正如 RonakShah 还提醒我的那样，gather/spread 可以替换为更新的（和更有特色的）表兄弟：pivot_longer/pivot_wider.

另一种技术使用 apply:

sample_tibble %>%
  mutate(mu = apply(.[,-1], 1, mean, na.rm = TRUE)) %>%
  ### similarly, and faster, thanks RonakShah
  # mutate(mu = rowMeans(.[,-1], na.rm = TRUE)) %>%
  mutate_at(vars(starts_with("Score")), ~ if_else(is.na(.), mu, .)) %>%
  select(-mu)

一个警告：.[,-1] 明确使用除第一列以外的每一列；如果您有问题中未提及的其他列，那么这肯定会使用比您预期更多的数据。不幸的是，我不知道在这个解决方案中使用 :-ranging 的方法，因为那样会更清楚。

Answer 2

一种利用一点数学的方法可能是：

df %>%
 mutate_at(vars(-1), 
           ~ pmax(is.na(.)*rowMeans(select(df, -1), na.rm = TRUE), 
                  (!is.na(.))*., 
                  na.rm = TRUE))


  ID Score1 Score2 Score3 Score4 Score5
1  1     90     66     83     84     92
2  2     80     78     86     76     80
3  3     70     86     96     72     74

如何在 tibble 的多行（按行）上改变 NA

How to mutate NA on multiple rows (rowwise) in tibble

r

dplyr

tidyverse

tibble