用每行的平均值插入 NA 值,但仅针对数值之间的一个或两个 NA 值

Interpolate NA values with the mean for each row but only for one or two NA values between numerical values

我尝试为每一行插入 NA 值,但如果我有两个或更少的 NA 值彼此相邻,我只想插入 NA 值。因此,例如在第 3 行中,有三个 NA 彼此相邻,所以我不想进行插值,但在第一行和第二行中,有两个或更少的 NA 彼此相邻,因此我的目标是对它们进行线性插值。有没有有效的处理方法?

我有一个看起来像这样的数据集:

df1:
   ID string1 2018 2019 2020 2021 2022 string2
1: a1      x2    3    3   NA    4    4      si
2: a2      g3    5    5   NA   NA    1      q2
3: a3      n2   11   NA   NA   NA    3      oq
4: a4      m3    3   NA    9    8    8      mx
5: a5      2w    9    1   NA    5   NA      ix
6: a6     ps2    2   NA    7    4    4      p2
7: a7     kg2    6   NA   NA   NA    6      2q

为了可重复性:

df1 = data.table(
  ID = c("a1", "a2", "a3", "a4", "a5", "a6", "a7"),
  "string1" = c("x2", "g3", "n2", "m3", "2w", "ps2", "kg2"),
  "2018" = c(3,5,11,3,9,2,6),
  "2019" = c(3,5,NA,NA,1,NA,NA),
  "2020" = c(NA,NA,NA,9,NA,7,NA),
  "2021" = c(4,NA,NA,8,5,4,NA),
  "2022" = c(4,1,3,8,NA,4,6),
  "string2" = c("si", "q2", "oq", "mx", "ix", "p2", "2q"))

我试着得到一个看起来像这样的 data.table:

   ID string1 2018 2019 2020 2021 2022 string2
1: a1      x2    3 3.00  3.5    4    4      si
2: a2      g3    5 5.00  4.3    3    1      q2
3: a3      n2   11   NA   NA   NA    3      oq
4: a4      m3    3 8.25  9.0    8    8      mx
5: a5      2w    9 1.00 -0.3    5   17      ix
6: a6     ps2    2 8.00  7.0    4    4      p2
7: a7     kg2    6   NA  NA    NA    6      2q

感谢任何建议!

请使用 data.tableimputeTS 库找到解决方案(参见下面的 reprex)。

Reprex

  • 代码
library(data.table)
library(imputeTS)

results <- df1 %>% 
  transpose(., keep.names = 'rn') %>% 
  {.[3:nrow(df1), lapply(.SD, as.numeric),
  ][, lapply(.SD, na_interpolation, "spline", 2)]} %>% 
  round(., 2) %>%  
  transpose(., make.names = 'rn') %>% 
  cbind(.,df1[,c("ID", "string1", "string2")]) %>% 
  setcolorder(., names(df1))
  • 输出
results
#>        ID string1  2018  2019  2020  2021  2022 string2
#>    <char>  <char> <num> <num> <num> <num> <num>  <char>
#> 1:     a1      x2     3  3.00  3.50     4     4      si
#> 2:     a2      g3     5  5.00  4.33     3     1      q2
#> 3:     a3      n2    11    NA    NA    NA     3      oq
#> 4:     a4      m3     3  8.25  9.00     8     8      mx
#> 5:     a5      2w     9  1.00 -0.50     5     5      ix
#> 6:     a6     ps2     2  8.00  7.00     4     4      p2
#> 7:     a7     kg2     6    NA    NA    NA     6      2q

reprex package (v2.0.1)

于 2021-12-02 创建

使用 data.tablezoo 库可能是更好的解决方案(参见下面的 reprex)。这个解决方案给出了你想要的结果(即忘记我在你的问题下的评论!)

Reprex

  • 代码
library(data.table)
library(zoo)
library(magrittr) # for the pipes! 

results <- df1 %>% 
  transpose(., keep.names = 'rn') %>% 
  {.[3:nrow(df1), lapply(.SD, as.numeric),
  ][, lapply(.SD, na.spline, maxgap = 2)]} %>% 
  round(., 2) %>%  
  transpose(., make.names = 'rn') %>% 
  cbind(.,df1[,c("ID", "string1", "string2")]) %>% 
  setcolorder(., names(df1))
  • 输出
results
#>        ID string1  2018  2019  2020  2021  2022 string2
#>    <char>  <char> <num> <num> <num> <num> <num>  <char>
#> 1:     a1      x2     3  3.00  3.50     4     4      si
#> 2:     a2      g3     5  5.00  4.33     3     1      q2
#> 3:     a3      n2    11    NA    NA    NA     3      oq
#> 4:     a4      m3     3  8.25  9.00     8     8      mx
#> 5:     a5      2w     9  1.00 -0.33     5    17      ix
#> 6:     a6     ps2     2  8.00  7.00     4     4      p2
#> 7:     a7     kg2     6    NA    NA    NA     6      2q

reprex package (v2.0.1)

于 2021-12-03 创建