r中列之间的线性插值
Linear interpolation among columns in r
我正在处理一些温度数据,其中我有特定深度的温度,例如0.9m、2.5m 和 5m。我想对这些值进行插值,以便获得每米的温度,例如1 米、2 米和 3 米。原始数据如下所示:
df
# A tibble: 5 x 3
date d_0.9 d_2.5
<dttm> <dbl> <dbl>
1 2004-01-05 03:00:00 7 8
2 2004-01-05 04:00:00 7.5 9
3 2004-01-05 05:00:00 7 8
4 2004-01-05 06:00:00 6.92 NA
我想得到的是这样的:
df_int
# A tibble: 5 x 5
date d_0.9 d_1 d_2 d_2.5
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2004-01-05 03:00:00 7 7.0625 7.6875 8
2 2004-01-05 04:00:00 7.5 7.59375 8.53125 9
3 2004-01-05 05:00:00 7 7.0625 7.6875 8
4 2004-01-05 06:00:00 6.92 NA NA NA
我必须对非常大的数据框执行此操作。有没有有效的方法?
非常感谢
一个选项是将数据转换为长格式,使用连接为我们想要插值的深度添加行,然后使用 approx
进行插值:
library(tidyverse)
# Data
df = tibble(date=seq(as.POSIXct("2004-01-05 03:00:00"),
as.POSIXct("2004-01-05 06:00:00"),
by="1 hour"),
d_0.9 = c(7,7.5,7,6.92),
d_2.5 = c(8,NA,8,NA),
d_5.0 = c(10,10.5,9.4,NA))
# Create a data frame with all of the times and depths we want to interpolate at
depths = sort(unique(c(c(0.9, 2.5, 5), seq(ceiling(0.9), floor(5), 1))))
depths = crossing(date=unique(df$date), depth = depths)
# Convert data to long format, join to add interpolation depths, then interpolate
df.interp = df %>%
gather(depth, value, -date) %>%
mutate(depth = as.numeric(gsub("d_", "", depth))) %>%
full_join(depths) %>%
arrange(date, depth) %>%
group_by(date) %>%
mutate(value.interp = if(length(na.omit(value)) > 1) {
approx(depth, value, xout=depth)$y
} else {
value
})
在上面的代码中,包含 if
语句以防止 approx
在给定的 date
只有一个非缺失值时抛出错误。
df.interp
date depth value value.interp
1 2004-01-05 03:00:00 0.9 7.00 7.000000
2 2004-01-05 03:00:00 1.0 NA 7.062500
3 2004-01-05 03:00:00 2.0 NA 7.687500
4 2004-01-05 03:00:00 2.5 8.00 8.000000
5 2004-01-05 03:00:00 3.0 NA 8.400000
6 2004-01-05 03:00:00 4.0 NA 9.200000
7 2004-01-05 03:00:00 5.0 10.00 10.000000
8 2004-01-05 04:00:00 0.9 7.50 7.500000
9 2004-01-05 04:00:00 1.0 NA 7.573171
10 2004-01-05 04:00:00 2.0 NA 8.304878
11 2004-01-05 04:00:00 2.5 NA 8.670732
12 2004-01-05 04:00:00 3.0 NA 9.036585
13 2004-01-05 04:00:00 4.0 NA 9.768293
14 2004-01-05 04:00:00 5.0 10.50 10.500000
15 2004-01-05 05:00:00 0.9 7.00 7.000000
16 2004-01-05 05:00:00 1.0 NA 7.062500
17 2004-01-05 05:00:00 2.0 NA 7.687500
18 2004-01-05 05:00:00 2.5 8.00 8.000000
19 2004-01-05 05:00:00 3.0 NA 8.280000
20 2004-01-05 05:00:00 4.0 NA 8.840000
21 2004-01-05 05:00:00 5.0 9.40 9.400000
22 2004-01-05 06:00:00 0.9 6.92 6.920000
23 2004-01-05 06:00:00 1.0 NA NA
24 2004-01-05 06:00:00 2.0 NA NA
25 2004-01-05 06:00:00 2.5 NA NA
26 2004-01-05 06:00:00 3.0 NA NA
27 2004-01-05 06:00:00 4.0 NA NA
28 2004-01-05 06:00:00 5.0 NA NA
我正在处理一些温度数据,其中我有特定深度的温度,例如0.9m、2.5m 和 5m。我想对这些值进行插值,以便获得每米的温度,例如1 米、2 米和 3 米。原始数据如下所示:
df
# A tibble: 5 x 3
date d_0.9 d_2.5
<dttm> <dbl> <dbl>
1 2004-01-05 03:00:00 7 8
2 2004-01-05 04:00:00 7.5 9
3 2004-01-05 05:00:00 7 8
4 2004-01-05 06:00:00 6.92 NA
我想得到的是这样的:
df_int
# A tibble: 5 x 5
date d_0.9 d_1 d_2 d_2.5
<dttm> <dbl> <dbl> <dbl> <dbl>
1 2004-01-05 03:00:00 7 7.0625 7.6875 8
2 2004-01-05 04:00:00 7.5 7.59375 8.53125 9
3 2004-01-05 05:00:00 7 7.0625 7.6875 8
4 2004-01-05 06:00:00 6.92 NA NA NA
我必须对非常大的数据框执行此操作。有没有有效的方法?
非常感谢
一个选项是将数据转换为长格式,使用连接为我们想要插值的深度添加行,然后使用 approx
进行插值:
library(tidyverse)
# Data
df = tibble(date=seq(as.POSIXct("2004-01-05 03:00:00"),
as.POSIXct("2004-01-05 06:00:00"),
by="1 hour"),
d_0.9 = c(7,7.5,7,6.92),
d_2.5 = c(8,NA,8,NA),
d_5.0 = c(10,10.5,9.4,NA))
# Create a data frame with all of the times and depths we want to interpolate at
depths = sort(unique(c(c(0.9, 2.5, 5), seq(ceiling(0.9), floor(5), 1))))
depths = crossing(date=unique(df$date), depth = depths)
# Convert data to long format, join to add interpolation depths, then interpolate
df.interp = df %>%
gather(depth, value, -date) %>%
mutate(depth = as.numeric(gsub("d_", "", depth))) %>%
full_join(depths) %>%
arrange(date, depth) %>%
group_by(date) %>%
mutate(value.interp = if(length(na.omit(value)) > 1) {
approx(depth, value, xout=depth)$y
} else {
value
})
在上面的代码中,包含 if
语句以防止 approx
在给定的 date
只有一个非缺失值时抛出错误。
df.interp
date depth value value.interp 1 2004-01-05 03:00:00 0.9 7.00 7.000000 2 2004-01-05 03:00:00 1.0 NA 7.062500 3 2004-01-05 03:00:00 2.0 NA 7.687500 4 2004-01-05 03:00:00 2.5 8.00 8.000000 5 2004-01-05 03:00:00 3.0 NA 8.400000 6 2004-01-05 03:00:00 4.0 NA 9.200000 7 2004-01-05 03:00:00 5.0 10.00 10.000000 8 2004-01-05 04:00:00 0.9 7.50 7.500000 9 2004-01-05 04:00:00 1.0 NA 7.573171 10 2004-01-05 04:00:00 2.0 NA 8.304878 11 2004-01-05 04:00:00 2.5 NA 8.670732 12 2004-01-05 04:00:00 3.0 NA 9.036585 13 2004-01-05 04:00:00 4.0 NA 9.768293 14 2004-01-05 04:00:00 5.0 10.50 10.500000 15 2004-01-05 05:00:00 0.9 7.00 7.000000 16 2004-01-05 05:00:00 1.0 NA 7.062500 17 2004-01-05 05:00:00 2.0 NA 7.687500 18 2004-01-05 05:00:00 2.5 8.00 8.000000 19 2004-01-05 05:00:00 3.0 NA 8.280000 20 2004-01-05 05:00:00 4.0 NA 8.840000 21 2004-01-05 05:00:00 5.0 9.40 9.400000 22 2004-01-05 06:00:00 0.9 6.92 6.920000 23 2004-01-05 06:00:00 1.0 NA NA 24 2004-01-05 06:00:00 2.0 NA NA 25 2004-01-05 06:00:00 2.5 NA NA 26 2004-01-05 06:00:00 3.0 NA NA 27 2004-01-05 06:00:00 4.0 NA NA 28 2004-01-05 06:00:00 5.0 NA NA