r - 为缺失的月度数据插入行并进行插值
r - insert row for missing monthly data and interpolate
我有一个包含 5000 多行的数据框,如下所示。我试图在缺少月份的地方插入一行,例如下面的第 6 个月 - 然后利用线性插值法计算 'TWS' 值。理想情况下,小数日期也会适当填充,但如果没有,我可以在之后对其进行排序!数据框是 10 年 (2003-2012) 的月份 1:12,但是对于多个网格正方形会重复。
我发现了很多其他类似的问题,但与重复的 1:12 月度序列无关。
> head(ts.data,20)
GridNo GridIndex Lon Lat DecimDate Year Month TWS
1 GR72 72 35.5 -4.5 2003.000 2003 01 14.2566781
2 GR72 72 35.5 -4.5 2003.083 2003 02 5.0413706
3 GR72 72 35.5 -4.5 2003.167 2003 03 3.8192721
4 GR72 72 35.5 -4.5 2003.250 2003 04 5.8706026
5 GR72 72 35.5 -4.5 2003.333 2003 05 7.8461188
6 GR72 72 35.5 -4.5 2003.500 2003 07 2.3821844
7 GR72 72 35.5 -4.5 2003.583 2003 08 0.1995629
8 GR72 72 35.5 -4.5 2003.667 2003 09 -1.8353604
9 GR72 72 35.5 -4.5 2003.750 2003 10 -2.0410653
10 GR72 72 35.5 -4.5 2003.833 2003 11 -1.4029813
11 GR72 72 35.5 -4.5 2003.917 2003 12 -0.2206872
12 GR72 72 35.5 -4.5 2004.000 2004 01 -0.5090872
13 GR72 72 35.5 -4.5 2004.083 2004 02 -0.4887118
14 GR72 72 35.5 -4.5 2004.167 2004 03 -0.7725966
15 GR72 72 35.5 -4.5 2004.250 2004 04 4.1831581
16 GR72 72 35.5 -4.5 2004.333 2004 05 2.5651040
17 GR72 72 35.5 -4.5 2004.417 2004 06 -2.2511409
18 GR72 72 35.5 -4.5 2004.500 2004 07 -1.6484375
19 GR72 72 35.5 -4.5 2004.583 2004 08 -4.6508982
20 GR72 72 35.5 -4.5 2004.667 2004 09 -5.0053745
感谢任何帮助!
使用 data.table
和 zoo
包,您可以轻松扩展数据集并进行插值,只要您没有 NA
年份的两种大小[=15] =]
扩展数据集
library(data.table)
library(zoo)
res <- setDT(df)[, .SD[match(1:12, Month)], by = Year]
在您想要的任何列上进行插值
cols <- c("Month", "DecimDate", "TWS")
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols]
res
# Year GridNo GridIndex Lon Lat DecimDate Month TWS
# 1: 2003 GR72 72 35.5 -4.5 2003.000 1 14.2566781
# 2: 2003 GR72 72 35.5 -4.5 2003.083 2 5.0413706
# 3: 2003 GR72 72 35.5 -4.5 2003.167 3 3.8192721
# 4: 2003 GR72 72 35.5 -4.5 2003.250 4 5.8706026
# 5: 2003 GR72 72 35.5 -4.5 2003.333 5 7.8461188
# 6: 2003 NA NA NA NA 2003.417 6 5.1141516
# 7: 2003 GR72 72 35.5 -4.5 2003.500 7 2.3821844
# 8: 2003 GR72 72 35.5 -4.5 2003.583 8 0.1995629
# 9: 2003 GR72 72 35.5 -4.5 2003.667 9 -1.8353604
# 10: 2003 GR72 72 35.5 -4.5 2003.750 10 -2.0410653
# 11: 2003 GR72 72 35.5 -4.5 2003.833 11 -1.4029813
# 12: 2003 GR72 72 35.5 -4.5 2003.917 12 -0.2206872
# 13: 2004 GR72 72 35.5 -4.5 2004.000 1 -0.5090872
# 14: 2004 GR72 72 35.5 -4.5 2004.083 2 -0.4887118
# 15: 2004 GR72 72 35.5 -4.5 2004.167 3 -0.7725966
# 16: 2004 GR72 72 35.5 -4.5 2004.250 4 4.1831581
# 17: 2004 GR72 72 35.5 -4.5 2004.333 5 2.5651040
# 18: 2004 GR72 72 35.5 -4.5 2004.417 6 -2.2511409
# 19: 2004 GR72 72 35.5 -4.5 2004.500 7 -1.6484375
# 20: 2004 GR72 72 35.5 -4.5 2004.583 8 -4.6508982
# 21: 2004 GR72 72 35.5 -4.5 2004.667 9 -5.0053745
# 22: 2004 NA NA NA NA NA NA NA
# 23: 2004 NA NA NA NA NA NA NA
# 24: 2004 NA NA NA NA NA NA NA
我会先将您的日期转换为实际日期(这里取每个月的第一天:
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
对目标做同样的事情,缺少月份(这里只有一个,但可以处理多个):
target <- as.Date("2003-06-01")
并进行近似:
approx(dates, ts.data$TWS, target)
$x
[1] "2003-06-01"
$y
[1] 5.069365
因此在您的数据框的上下文中(此处已简化):
ts.data <- data.frame(Year=c(rep(2003,11),rep(2004,9)),Month=c((1:12)[-6],1:9),TWS=c(14.2566781,5.0413706,3.8192721,5.8706026,7.8461188, 2.3821844, 0.1995629,-1.8353604,-2.0410653,-1.4029813,-0.2206872,-0.5090872,-0.4887118,-0.7725966, 4.1831581, 2.5651040,-2.2511409,-1.6484375,-4.6508982, -5.0053745))
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
target <- as.Date("2003-06-01")
ts.data <- rbind(ts.data,
data.frame(Year=2003,
Month=6,
TWS=approx(dates, ts.data$TWS, target)$y)
ts.data <- ts.data[order(ts.data$Year, ts.data$Month),]
我有一个包含 5000 多行的数据框,如下所示。我试图在缺少月份的地方插入一行,例如下面的第 6 个月 - 然后利用线性插值法计算 'TWS' 值。理想情况下,小数日期也会适当填充,但如果没有,我可以在之后对其进行排序!数据框是 10 年 (2003-2012) 的月份 1:12,但是对于多个网格正方形会重复。
我发现了很多其他类似的问题,但与重复的 1:12 月度序列无关。
> head(ts.data,20)
GridNo GridIndex Lon Lat DecimDate Year Month TWS
1 GR72 72 35.5 -4.5 2003.000 2003 01 14.2566781
2 GR72 72 35.5 -4.5 2003.083 2003 02 5.0413706
3 GR72 72 35.5 -4.5 2003.167 2003 03 3.8192721
4 GR72 72 35.5 -4.5 2003.250 2003 04 5.8706026
5 GR72 72 35.5 -4.5 2003.333 2003 05 7.8461188
6 GR72 72 35.5 -4.5 2003.500 2003 07 2.3821844
7 GR72 72 35.5 -4.5 2003.583 2003 08 0.1995629
8 GR72 72 35.5 -4.5 2003.667 2003 09 -1.8353604
9 GR72 72 35.5 -4.5 2003.750 2003 10 -2.0410653
10 GR72 72 35.5 -4.5 2003.833 2003 11 -1.4029813
11 GR72 72 35.5 -4.5 2003.917 2003 12 -0.2206872
12 GR72 72 35.5 -4.5 2004.000 2004 01 -0.5090872
13 GR72 72 35.5 -4.5 2004.083 2004 02 -0.4887118
14 GR72 72 35.5 -4.5 2004.167 2004 03 -0.7725966
15 GR72 72 35.5 -4.5 2004.250 2004 04 4.1831581
16 GR72 72 35.5 -4.5 2004.333 2004 05 2.5651040
17 GR72 72 35.5 -4.5 2004.417 2004 06 -2.2511409
18 GR72 72 35.5 -4.5 2004.500 2004 07 -1.6484375
19 GR72 72 35.5 -4.5 2004.583 2004 08 -4.6508982
20 GR72 72 35.5 -4.5 2004.667 2004 09 -5.0053745
感谢任何帮助!
使用 data.table
和 zoo
包,您可以轻松扩展数据集并进行插值,只要您没有 NA
年份的两种大小[=15] =]
扩展数据集
library(data.table)
library(zoo)
res <- setDT(df)[, .SD[match(1:12, Month)], by = Year]
在您想要的任何列上进行插值
cols <- c("Month", "DecimDate", "TWS")
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols]
res
# Year GridNo GridIndex Lon Lat DecimDate Month TWS
# 1: 2003 GR72 72 35.5 -4.5 2003.000 1 14.2566781
# 2: 2003 GR72 72 35.5 -4.5 2003.083 2 5.0413706
# 3: 2003 GR72 72 35.5 -4.5 2003.167 3 3.8192721
# 4: 2003 GR72 72 35.5 -4.5 2003.250 4 5.8706026
# 5: 2003 GR72 72 35.5 -4.5 2003.333 5 7.8461188
# 6: 2003 NA NA NA NA 2003.417 6 5.1141516
# 7: 2003 GR72 72 35.5 -4.5 2003.500 7 2.3821844
# 8: 2003 GR72 72 35.5 -4.5 2003.583 8 0.1995629
# 9: 2003 GR72 72 35.5 -4.5 2003.667 9 -1.8353604
# 10: 2003 GR72 72 35.5 -4.5 2003.750 10 -2.0410653
# 11: 2003 GR72 72 35.5 -4.5 2003.833 11 -1.4029813
# 12: 2003 GR72 72 35.5 -4.5 2003.917 12 -0.2206872
# 13: 2004 GR72 72 35.5 -4.5 2004.000 1 -0.5090872
# 14: 2004 GR72 72 35.5 -4.5 2004.083 2 -0.4887118
# 15: 2004 GR72 72 35.5 -4.5 2004.167 3 -0.7725966
# 16: 2004 GR72 72 35.5 -4.5 2004.250 4 4.1831581
# 17: 2004 GR72 72 35.5 -4.5 2004.333 5 2.5651040
# 18: 2004 GR72 72 35.5 -4.5 2004.417 6 -2.2511409
# 19: 2004 GR72 72 35.5 -4.5 2004.500 7 -1.6484375
# 20: 2004 GR72 72 35.5 -4.5 2004.583 8 -4.6508982
# 21: 2004 GR72 72 35.5 -4.5 2004.667 9 -5.0053745
# 22: 2004 NA NA NA NA NA NA NA
# 23: 2004 NA NA NA NA NA NA NA
# 24: 2004 NA NA NA NA NA NA NA
我会先将您的日期转换为实际日期(这里取每个月的第一天:
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
对目标做同样的事情,缺少月份(这里只有一个,但可以处理多个):
target <- as.Date("2003-06-01")
并进行近似:
approx(dates, ts.data$TWS, target)
$x
[1] "2003-06-01"
$y
[1] 5.069365
因此在您的数据框的上下文中(此处已简化):
ts.data <- data.frame(Year=c(rep(2003,11),rep(2004,9)),Month=c((1:12)[-6],1:9),TWS=c(14.2566781,5.0413706,3.8192721,5.8706026,7.8461188, 2.3821844, 0.1995629,-1.8353604,-2.0410653,-1.4029813,-0.2206872,-0.5090872,-0.4887118,-0.7725966, 4.1831581, 2.5651040,-2.2511409,-1.6484375,-4.6508982, -5.0053745))
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
target <- as.Date("2003-06-01")
ts.data <- rbind(ts.data,
data.frame(Year=2003,
Month=6,
TWS=approx(dates, ts.data$TWS, target)$y)
ts.data <- ts.data[order(ts.data$Year, ts.data$Month),]