使用 R 中的列计算两个日期之间的平均值的问题

Question

我需要计算基于两个日期的列的平均值。数据table如下图

pol      id   acres    date           mean       st_date        end_date
12345    5    123.8    05_26_2019     0.2225     2019-07-24     2019-09-07
12345    5    123.8    06_11_2019     0.6523     2019-07-24     2019-09-07     
12345    5    123.8    06_27_2019     0.8563     2019-07-24     2019-09-07
12345    5    123.8    07_13_2019     0.1542     2019-07-24     2019-09-07
12345    5    123.8    07_29_2019     0.4253     2019-07-24     2019-09-07
12345    5    123.8    09_15_2019     0.1521     2019-07-24     2019-09-07
67890    4    60.0     05_05_2019     0.3652     2019-07-15     2019-08-31
67890    4    60.0     06_02_2019     0.4585     2019-07-15     2019-08-31
67890    4    60.0     07_10_2019     0.5856     2019-07-15     2019-08-31
67890    4    60.0     07_18_2019     0.6585     2019-07-15     2019-08-31
67890    4    60.0     09_02_2019     0.8585     2019-07-15     2019-08-31

我需要获取日期列中日期介于 st_date 和 end_date 之间的平均列的平均值。所需的输出如下所示。平均column 日期列中日期的平均列值的平均值介于 st_date 和 end_date 之间。 (0.4253 + 0.1521)/2 = 0.2887

Output:

    pol      id   acres    date           mean       st_date        end_date       avg.
    12345    5    123.8    05_26_2019     0.2225     2019-07-24     2019-09-16     0.2887
    12345    5    123.8    06_11_2019     0.6523     2019-07-24     2019-09-16     0.2887
    12345    5    123.8    06_27_2019     0.8563     2019-07-24     2019-09-16     0.2887
    12345    5    123.8    07_13_2019     0.1542     2019-07-24     2019-09-16     0.2887
    12345    5    123.8    07_29_2019     0.4253     2019-07-24     2019-09-16     0.2887
    12345    5    123.8    09_15_2019     0.1521     2019-07-24     2019-09-16     0.2887

有人可以帮我解决这个问题吗？我更喜欢 data.table 解决方案。

谢谢，

Answer 1

不确定您是否有多个组并且您需要计算每个组的平均值。如果是这种情况，请查看以下代码是否适合您：

> library(dplyr)
> library(tidyr)
> df %>% 
+ left_join(df %>% group_by(id) %>% filter(date> st_date & date < end_date) %>% mutate(avg = mean(mean)) %>% select(id, date, avg), by = c('id' = 'id', 'date' = 'date'), keep = F) %>% mutate(avg = replace_na(avg, mean(avg, na.rm = T)))
# A tibble: 6 x 8
    pol    id acres date        mean st_date    end_date     avg
  <dbl> <dbl> <dbl> <date>     <dbl> <date>     <date>     <dbl>
1 12345     5  124. 2019-05-26 0.222 2019-07-24 2019-09-16 0.289
2 12345     5  124. 2019-06-11 0.652 2019-07-24 2019-09-16 0.289
3 12345     5  124. 2019-06-27 0.856 2019-07-24 2019-09-16 0.289
4 12345     5  124. 2019-07-13 0.154 2019-07-24 2019-09-16 0.289
5 12345     5  124. 2019-07-29 0.425 2019-07-24 2019-09-16 0.289
6 12345     5  124. 2019-09-15 0.152 2019-07-24 2019-09-16 0.289
>

我的代码：

final_pl_date_sel %>% 
  left_join(df %>% group_by(pol,id,acres) %>% filter(date> st_date & date < end_date) %>% mutate(avg = mean(mean)) %>% select(pol, id, acres, date, avg), by = c('pol' = 'pol','id' = 'id','acres' = 'acres', 'date' = 'date'), keep = F) %>% mutate(avg = replace_na(avg, mean(avg, na.rm = T)))

使用您的代码：

> df %>% 
+   left_join(df %>% group_by(pol, id, acres) %>% filter(date> st_date & date < end_date) %>% 
+       mutate(avg = mean(mean)) %>% select(pol, id, acres, date, avg), by = c('pol' = 'pol','id' = 'id','acres' = 'acres', 'date' = 'date'), keep = F) %>% 
+             mutate(avg = replace_na(avg, mean(avg, na.rm = T)))
# A tibble: 6 x 8
    pol    id acres date        mean st_date    end_date     avg
  <dbl> <dbl> <dbl> <date>     <dbl> <date>     <date>     <dbl>
1 12345     5  124. 2019-05-26 0.222 2019-07-24 2019-09-16 0.289
2 12345     5  124. 2019-06-11 0.652 2019-07-24 2019-09-16 0.289
3 12345     5  124. 2019-06-27 0.856 2019-07-24 2019-09-16 0.289
4 12345     5  124. 2019-07-13 0.154 2019-07-24 2019-09-16 0.289
5 12345     5  124. 2019-07-29 0.425 2019-07-24 2019-09-16 0.289
6 12345     5  124. 2019-09-15 0.152 2019-07-24 2019-09-16 0.289

在左边使用“df”table 因为我没有“final_pl_date_sel”table.

我的 df:

> df
# A tibble: 6 x 7
    pol    id acres date        mean st_date    end_date  
  <dbl> <dbl> <dbl> <date>     <dbl> <date>     <date>    
1 12345     5  124. 2019-05-26 0.222 2019-07-24 2019-09-16
2 12345     5  124. 2019-06-11 0.652 2019-07-24 2019-09-16
3 12345     5  124. 2019-06-27 0.856 2019-07-24 2019-09-16
4 12345     5  124. 2019-07-13 0.154 2019-07-24 2019-09-16
5 12345     5  124. 2019-07-29 0.425 2019-07-24 2019-09-16
6 12345     5  124. 2019-09-15 0.152 2019-07-24 2019-09-16
>

使用 R 中的列计算两个日期之间的平均值的问题

Issue in calculating mean between two dates using a column in R

r

date

range

mean