插入年中平均值

Question

我对一系列地区的收入进行了年度观察，如下所示：

library(dplyr)
library(lubridate)
date <- c("2004-01-01", "2005-01-01", "2006-01-01", 
          "2004-01-01", "2005-01-01", "2006-01-01")
geo <- c(1, 1, 1, 2, 2, 2)
inc <- c(10, 12, 14, 32, 34, 50)
data <- tibble(date = ymd(date), geo, inc)

  date        geo   inc
  <date>     <dbl> <dbl>
1 2004-01-01     1    10
2 2005-01-01     1    12
3 2006-01-01     1    14
4 2004-01-01     2    32
5 2005-01-01     2    34
6 2006-01-01     2    50

我需要插入年中值，作为年初和年末观察值的平均值，以便每 6 个月提供一次数据。结果是这样的：

2004-01-01     1    10
2004-06-01     1    11
2005-01-01     1    12
2004-06-01     1    13
2006-01-01     1    14
2004-01-01     2    32
2004-06-01     2    33
2005-01-01     2    34
2004-06-01     2    42
2006-01-01     2    50

如有任何想法，我们将不胜感激。

Answer 1

按'geoo'分组，将'inc'与下一个值(lead)相加(+)，得到平均值(/2)，以及向 'date' 添加 5 个月，然后 filter 出 'inc' 中的 NA 元素，将行与原始数据绑定

library(dplyr)
library(lubridate)
data %>% 
    group_by(geo) %>% 
    summarise(date = date %m+% months(5),
              inc = (inc + lead(inc))/2, .groups = 'drop') %>%
    filter(!is.na(inc)) %>%
    bind_rows(data, .) %>% 
    arrange(geo, date)

-输出

# A tibble: 10 x 3
#   date         geo   inc
#   <date>     <dbl> <dbl>
# 1 2004-01-01     1    10
# 2 2004-06-01     1    11
# 3 2005-01-01     1    12
# 4 2005-06-01     1    13
# 5 2006-01-01     1    14
# 6 2004-01-01     2    32
# 7 2004-06-01     2    33
# 8 2005-01-01     2    34
# 9 2005-06-01     2    42
#10 2006-01-01     2    50

Answer 2

您可以使用 complete 创建 6 个月的日期序列，然后使用 na.approx 用内插值填充 NA 值。

library(dplyr)
library(lubridate)

data %>%
  group_by(geo) %>%
  tidyr::complete(date = seq(min(date), max(date), by = '6 months')) %>%
  mutate(date = if_else(is.na(inc), date %m-% months(1), date), 
         inc = zoo::na.approx(inc))

#    geo date         inc
#   <dbl> <date>     <dbl>
# 1     1 2004-01-01    10
# 2     1 2004-06-01    11
# 3     1 2005-01-01    12
# 4     1 2005-06-01    13
# 5     1 2006-01-01    14
# 6     2 2004-01-01    32
# 7     2 2004-06-01    33
# 8     2 2005-01-01    34
# 9     2 2005-06-01    42
#10     2 2006-01-01    50

插入年中平均值

Interpolating Mid-Year Averages

interpolation

r

dplyr