用 data.table 插值 2D 数据 - 填充 NA

interpolating 2D data with data.table - filling NAs

我合并了两个数据集,时间步长为 t,高度为 h。

dataset_a <- data.table(t=rep(c(1,2,3,4,5,6,7,8,9), each=5),
                        h=rep(c(1:5)),
                        v=c(1:(5*9)))

一个有测量差距,以及我们实际测量但没有测量到的值。

dataset_b <- data.table(t=rep(c(1,2,4,5,6,8,9), each=5),
                        h=rep(c(1:5)),
                        w=c(1:(5*7)))

dataset_b$w[12:20] <-0

合并:

dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))

现在我想填补空白。我如何告诉 data.table 使用相邻值来填充像素?

dataset_merged[is.na(w), 
               w:= mean(c(the value at this h one timestep earlier, the value at this h one timestep later))]

非常感谢!

编辑 在 Bens 非常有帮助的评论之后,我不得不调整可重现的示例: 他的解决方案有效,但如果缺少 'framing' 数据则无效: 如果

dataset_b <- data.table(t=rep(c(2,4,5,6,8,9), each=5),
                        h=rep(c(1:5)),
                        w=c(1:(5*6)))
#removed the first timestep in this case
dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))


library(zoo)
dataset_merged[order(h,t)][, w := na.approx(w)] 

产量

Error in `[.data.table`(dataset_merged[order(h, t)], , `:=`(w, na.approx(w))) : 
  Supplied 44 items to be assigned to 45 items of column 'w'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.

将它们保留为 NA 是可以的,但是我如何向函数明确这一点? 不幸的是,原始数据不在规则的网格上。

或许试试这个方法。在插值前将数据table排序为h,并使w数值化为十进制。使用 approx(基础 R)和组 by = h.

dataset_merged[order(h,t)][, w:= as.numeric(w)][, w := approx(.I, w, .I)$y, by = h]

输出

    t h  v    w
 1: 1 1  1   NA
 2: 2 1  6  1.0
 3: 3 1 11  3.5
 4: 4 1 16  6.0
 5: 5 1 21 11.0
 6: 6 1 26 16.0
 7: 7 1 31 18.5
 8: 8 1 36 21.0
 9: 9 1 41 26.0
10: 1 2  2   NA
11: 2 2  7  2.0
12: 3 2 12  4.5
13: 4 2 17  7.0
14: 5 2 22 12.0
15: 6 2 27 17.0
16: 7 2 32 19.5
17: 8 2 37 22.0
18: 9 2 42 27.0
19: 1 3  3   NA
20: 2 3  8  3.0
21: 3 3 13  5.5
22: 4 3 18  8.0
23: 5 3 23 13.0
24: 6 3 28 18.0
25: 7 3 33 20.5
26: 8 3 38 23.0
27: 9 3 43 28.0
28: 1 4  4   NA
29: 2 4  9  4.0
30: 3 4 14  6.5
31: 4 4 19  9.0
32: 5 4 24 14.0
33: 6 4 29 19.0
34: 7 4 34 21.5
35: 8 4 39 24.0
36: 9 4 44 29.0
37: 1 5  5   NA
38: 2 5 10  5.0
39: 3 5 15  7.5
40: 4 5 20 10.0
41: 5 5 25 15.0
42: 6 5 30 20.0
43: 7 5 35 22.5
44: 8 5 40 25.0
45: 9 5 45 30.0
    t h  v    w

附加(根据 OP):如果有一个组只有 NA 个值 w,则必须将其排除。

编辑 (5/28/20):要防止在可用于插值的值少于 2 个时使用 approx,您还可以尝试:

dataset_merged[order(h,t)
  ][, w:= as.numeric(w)
    ][, w := if(length(na.omit(w)) < 2) w else approx(.I, w, .I)$y, by = h]

测试用例:

dataset_b <- data.table(t=rep(c(2,4,5,6,8,9), each=5),
                        h=1:5,
                        w=1:30)

dataset_b$w[c(F,F,T,F,F)] <- NA

dataset_merged <- merge(dataset_a, dataset_b, all=TRUE, by = c('t', 'h'))

输出

    t h  v    w
 1: 1 1  1   NA
 2: 2 1  6  1.0
 3: 3 1 11  3.5
 4: 4 1 16  6.0
 5: 5 1 21 11.0
 6: 6 1 26 16.0
 7: 7 1 31 18.5
 8: 8 1 36 21.0
 9: 9 1 41 26.0
10: 1 2  2   NA
11: 2 2  7  2.0
12: 3 2 12  4.5
13: 4 2 17  7.0
14: 5 2 22 12.0
15: 6 2 27 17.0
16: 7 2 32 19.5
17: 8 2 37 22.0
18: 9 2 42 27.0
19: 1 3  3   NA
20: 2 3  8   NA
21: 3 3 13   NA
22: 4 3 18   NA
23: 5 3 23   NA
24: 6 3 28   NA
25: 7 3 33   NA
26: 8 3 38   NA
27: 9 3 43   NA
28: 1 4  4   NA
29: 2 4  9  4.0
30: 3 4 14  6.5
31: 4 4 19  9.0
32: 5 4 24 14.0
33: 6 4 29 19.0
34: 7 4 34 21.5
35: 8 4 39 24.0
36: 9 4 44 29.0
37: 1 5  5   NA
38: 2 5 10  5.0
39: 3 5 15  7.5
40: 4 5 20 10.0
41: 5 5 25 15.0
42: 6 5 30 20.0
43: 7 5 35 22.5
44: 8 5 40 25.0
45: 9 5 45 30.0
    t h  v    w