r - 为缺失的月度数据插入行并进行插值

r - insert row for missing monthly data and interpolate

我有一个包含 5000 多行的数据框,如下所示。我试图在缺少月份的地方插入一行,例如下面的第 6 个月 - 然后利用线性插值法计算 'TWS' 值。理想情况下,小数日期也会适当填充,但如果没有,我可以在之后对其进行排序!数据框是 10 年 (2003-2012) 的月份 1:12,但是对于多个网格正方形会重复。

我发现了很多其他类似的问题,但与重复的 1:12 月度序列无关。

 > head(ts.data,20)
    GridNo GridIndex  Lon  Lat DecimDate Year Month        TWS
 1    GR72        72 35.5 -4.5  2003.000 2003    01 14.2566781
 2    GR72        72 35.5 -4.5  2003.083 2003    02  5.0413706
 3    GR72        72 35.5 -4.5  2003.167 2003    03  3.8192721
 4    GR72        72 35.5 -4.5  2003.250 2003    04  5.8706026
 5    GR72        72 35.5 -4.5  2003.333 2003    05  7.8461188
 6    GR72        72 35.5 -4.5  2003.500 2003    07  2.3821844
 7    GR72        72 35.5 -4.5  2003.583 2003    08  0.1995629
 8    GR72        72 35.5 -4.5  2003.667 2003    09 -1.8353604
 9    GR72        72 35.5 -4.5  2003.750 2003    10 -2.0410653
 10   GR72        72 35.5 -4.5  2003.833 2003    11 -1.4029813
 11   GR72        72 35.5 -4.5  2003.917 2003    12 -0.2206872
 12   GR72        72 35.5 -4.5  2004.000 2004    01 -0.5090872
 13   GR72        72 35.5 -4.5  2004.083 2004    02 -0.4887118
 14   GR72        72 35.5 -4.5  2004.167 2004    03 -0.7725966
 15   GR72        72 35.5 -4.5  2004.250 2004    04  4.1831581
 16   GR72        72 35.5 -4.5  2004.333 2004    05  2.5651040
 17   GR72        72 35.5 -4.5  2004.417 2004    06 -2.2511409
 18   GR72        72 35.5 -4.5  2004.500 2004    07 -1.6484375
 19   GR72        72 35.5 -4.5  2004.583 2004    08 -4.6508982
 20   GR72        72 35.5 -4.5  2004.667 2004    09 -5.0053745

感谢任何帮助!

使用 data.tablezoo 包,您可以轻松扩展数据集并进行插值,只要您没有 NA 年份的两种大小[=15] =]

扩展数据集

library(data.table)
library(zoo)
res <- setDT(df)[, .SD[match(1:12, Month)], by = Year]

在您想要的任何列上进行插值

cols <- c("Month", "DecimDate", "TWS")
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols]

res
#     Year GridNo GridIndex  Lon  Lat DecimDate Month        TWS
#  1: 2003   GR72        72 35.5 -4.5  2003.000     1 14.2566781
#  2: 2003   GR72        72 35.5 -4.5  2003.083     2  5.0413706
#  3: 2003   GR72        72 35.5 -4.5  2003.167     3  3.8192721
#  4: 2003   GR72        72 35.5 -4.5  2003.250     4  5.8706026
#  5: 2003   GR72        72 35.5 -4.5  2003.333     5  7.8461188
#  6: 2003     NA        NA   NA   NA  2003.417     6  5.1141516
#  7: 2003   GR72        72 35.5 -4.5  2003.500     7  2.3821844
#  8: 2003   GR72        72 35.5 -4.5  2003.583     8  0.1995629
#  9: 2003   GR72        72 35.5 -4.5  2003.667     9 -1.8353604
# 10: 2003   GR72        72 35.5 -4.5  2003.750    10 -2.0410653
# 11: 2003   GR72        72 35.5 -4.5  2003.833    11 -1.4029813
# 12: 2003   GR72        72 35.5 -4.5  2003.917    12 -0.2206872
# 13: 2004   GR72        72 35.5 -4.5  2004.000     1 -0.5090872
# 14: 2004   GR72        72 35.5 -4.5  2004.083     2 -0.4887118
# 15: 2004   GR72        72 35.5 -4.5  2004.167     3 -0.7725966
# 16: 2004   GR72        72 35.5 -4.5  2004.250     4  4.1831581
# 17: 2004   GR72        72 35.5 -4.5  2004.333     5  2.5651040
# 18: 2004   GR72        72 35.5 -4.5  2004.417     6 -2.2511409
# 19: 2004   GR72        72 35.5 -4.5  2004.500     7 -1.6484375
# 20: 2004   GR72        72 35.5 -4.5  2004.583     8 -4.6508982
# 21: 2004   GR72        72 35.5 -4.5  2004.667     9 -5.0053745
# 22: 2004     NA        NA   NA   NA        NA    NA         NA
# 23: 2004     NA        NA   NA   NA        NA    NA         NA
# 24: 2004     NA        NA   NA   NA        NA    NA         NA

我会先将您的日期转换为实际日期(这里取每个月的第一天:

dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))

对目标做同样的事情,缺少月份(这里只有一个,但可以处理多个):

target <- as.Date("2003-06-01")

并进行近似:

approx(dates, ts.data$TWS, target)
$x
[1] "2003-06-01"

$y
[1] 5.069365

因此在您的数据框的上下文中(此处已简化):

ts.data <- data.frame(Year=c(rep(2003,11),rep(2004,9)),Month=c((1:12)[-6],1:9),TWS=c(14.2566781,5.0413706,3.8192721,5.8706026,7.8461188, 2.3821844, 0.1995629,-1.8353604,-2.0410653,-1.4029813,-0.2206872,-0.5090872,-0.4887118,-0.7725966, 4.1831581, 2.5651040,-2.2511409,-1.6484375,-4.6508982, -5.0053745))
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
target <- as.Date("2003-06-01")
ts.data <- rbind(ts.data, 
                 data.frame(Year=2003, 
                            Month=6, 
                            TWS=approx(dates, ts.data$TWS, target)$y)
ts.data <- ts.data[order(ts.data$Year, ts.data$Month),]