使用 R 中的列计算两个日期之间的平均值的问题
Issue in calculating mean between two dates using a column in R
我需要计算基于两个日期的列的平均值。数据table如下图
pol id acres date mean st_date end_date
12345 5 123.8 05_26_2019 0.2225 2019-07-24 2019-09-07
12345 5 123.8 06_11_2019 0.6523 2019-07-24 2019-09-07
12345 5 123.8 06_27_2019 0.8563 2019-07-24 2019-09-07
12345 5 123.8 07_13_2019 0.1542 2019-07-24 2019-09-07
12345 5 123.8 07_29_2019 0.4253 2019-07-24 2019-09-07
12345 5 123.8 09_15_2019 0.1521 2019-07-24 2019-09-07
67890 4 60.0 05_05_2019 0.3652 2019-07-15 2019-08-31
67890 4 60.0 06_02_2019 0.4585 2019-07-15 2019-08-31
67890 4 60.0 07_10_2019 0.5856 2019-07-15 2019-08-31
67890 4 60.0 07_18_2019 0.6585 2019-07-15 2019-08-31
67890 4 60.0 09_02_2019 0.8585 2019-07-15 2019-08-31
我需要获取日期列中日期介于 st_date 和 end_date 之间的平均列的平均值。所需的输出如下所示。平均column 日期列中日期的平均列值的平均值介于 st_date 和 end_date 之间。 (0.4253 + 0.1521)/2 = 0.2887
Output:
pol id acres date mean st_date end_date avg.
12345 5 123.8 05_26_2019 0.2225 2019-07-24 2019-09-16 0.2887
12345 5 123.8 06_11_2019 0.6523 2019-07-24 2019-09-16 0.2887
12345 5 123.8 06_27_2019 0.8563 2019-07-24 2019-09-16 0.2887
12345 5 123.8 07_13_2019 0.1542 2019-07-24 2019-09-16 0.2887
12345 5 123.8 07_29_2019 0.4253 2019-07-24 2019-09-16 0.2887
12345 5 123.8 09_15_2019 0.1521 2019-07-24 2019-09-16 0.2887
有人可以帮我解决这个问题吗?我更喜欢 data.table 解决方案。
谢谢,
不确定您是否有多个组并且您需要计算每个组的平均值。如果是这种情况,请查看以下代码是否适合您:
> library(dplyr)
> library(tidyr)
> df %>%
+ left_join(df %>% group_by(id) %>% filter(date> st_date & date < end_date) %>% mutate(avg = mean(mean)) %>% select(id, date, avg), by = c('id' = 'id', 'date' = 'date'), keep = F) %>% mutate(avg = replace_na(avg, mean(avg, na.rm = T)))
# A tibble: 6 x 8
pol id acres date mean st_date end_date avg
<dbl> <dbl> <dbl> <date> <dbl> <date> <date> <dbl>
1 12345 5 124. 2019-05-26 0.222 2019-07-24 2019-09-16 0.289
2 12345 5 124. 2019-06-11 0.652 2019-07-24 2019-09-16 0.289
3 12345 5 124. 2019-06-27 0.856 2019-07-24 2019-09-16 0.289
4 12345 5 124. 2019-07-13 0.154 2019-07-24 2019-09-16 0.289
5 12345 5 124. 2019-07-29 0.425 2019-07-24 2019-09-16 0.289
6 12345 5 124. 2019-09-15 0.152 2019-07-24 2019-09-16 0.289
>
我的代码:
final_pl_date_sel %>%
left_join(df %>% group_by(pol,id,acres) %>% filter(date> st_date & date < end_date) %>% mutate(avg = mean(mean)) %>% select(pol, id, acres, date, avg), by = c('pol' = 'pol','id' = 'id','acres' = 'acres', 'date' = 'date'), keep = F) %>% mutate(avg = replace_na(avg, mean(avg, na.rm = T)))
使用您的代码:
> df %>%
+ left_join(df %>% group_by(pol, id, acres) %>% filter(date> st_date & date < end_date) %>%
+ mutate(avg = mean(mean)) %>% select(pol, id, acres, date, avg), by = c('pol' = 'pol','id' = 'id','acres' = 'acres', 'date' = 'date'), keep = F) %>%
+ mutate(avg = replace_na(avg, mean(avg, na.rm = T)))
# A tibble: 6 x 8
pol id acres date mean st_date end_date avg
<dbl> <dbl> <dbl> <date> <dbl> <date> <date> <dbl>
1 12345 5 124. 2019-05-26 0.222 2019-07-24 2019-09-16 0.289
2 12345 5 124. 2019-06-11 0.652 2019-07-24 2019-09-16 0.289
3 12345 5 124. 2019-06-27 0.856 2019-07-24 2019-09-16 0.289
4 12345 5 124. 2019-07-13 0.154 2019-07-24 2019-09-16 0.289
5 12345 5 124. 2019-07-29 0.425 2019-07-24 2019-09-16 0.289
6 12345 5 124. 2019-09-15 0.152 2019-07-24 2019-09-16 0.289
在左边使用“df”table 因为我没有“final_pl_date_sel”table.
我的 df:
> df
# A tibble: 6 x 7
pol id acres date mean st_date end_date
<dbl> <dbl> <dbl> <date> <dbl> <date> <date>
1 12345 5 124. 2019-05-26 0.222 2019-07-24 2019-09-16
2 12345 5 124. 2019-06-11 0.652 2019-07-24 2019-09-16
3 12345 5 124. 2019-06-27 0.856 2019-07-24 2019-09-16
4 12345 5 124. 2019-07-13 0.154 2019-07-24 2019-09-16
5 12345 5 124. 2019-07-29 0.425 2019-07-24 2019-09-16
6 12345 5 124. 2019-09-15 0.152 2019-07-24 2019-09-16
>
我需要计算基于两个日期的列的平均值。数据table如下图
pol id acres date mean st_date end_date
12345 5 123.8 05_26_2019 0.2225 2019-07-24 2019-09-07
12345 5 123.8 06_11_2019 0.6523 2019-07-24 2019-09-07
12345 5 123.8 06_27_2019 0.8563 2019-07-24 2019-09-07
12345 5 123.8 07_13_2019 0.1542 2019-07-24 2019-09-07
12345 5 123.8 07_29_2019 0.4253 2019-07-24 2019-09-07
12345 5 123.8 09_15_2019 0.1521 2019-07-24 2019-09-07
67890 4 60.0 05_05_2019 0.3652 2019-07-15 2019-08-31
67890 4 60.0 06_02_2019 0.4585 2019-07-15 2019-08-31
67890 4 60.0 07_10_2019 0.5856 2019-07-15 2019-08-31
67890 4 60.0 07_18_2019 0.6585 2019-07-15 2019-08-31
67890 4 60.0 09_02_2019 0.8585 2019-07-15 2019-08-31
我需要获取日期列中日期介于 st_date 和 end_date 之间的平均列的平均值。所需的输出如下所示。平均column 日期列中日期的平均列值的平均值介于 st_date 和 end_date 之间。 (0.4253 + 0.1521)/2 = 0.2887
Output:
pol id acres date mean st_date end_date avg.
12345 5 123.8 05_26_2019 0.2225 2019-07-24 2019-09-16 0.2887
12345 5 123.8 06_11_2019 0.6523 2019-07-24 2019-09-16 0.2887
12345 5 123.8 06_27_2019 0.8563 2019-07-24 2019-09-16 0.2887
12345 5 123.8 07_13_2019 0.1542 2019-07-24 2019-09-16 0.2887
12345 5 123.8 07_29_2019 0.4253 2019-07-24 2019-09-16 0.2887
12345 5 123.8 09_15_2019 0.1521 2019-07-24 2019-09-16 0.2887
有人可以帮我解决这个问题吗?我更喜欢 data.table 解决方案。
谢谢,
不确定您是否有多个组并且您需要计算每个组的平均值。如果是这种情况,请查看以下代码是否适合您:
> library(dplyr)
> library(tidyr)
> df %>%
+ left_join(df %>% group_by(id) %>% filter(date> st_date & date < end_date) %>% mutate(avg = mean(mean)) %>% select(id, date, avg), by = c('id' = 'id', 'date' = 'date'), keep = F) %>% mutate(avg = replace_na(avg, mean(avg, na.rm = T)))
# A tibble: 6 x 8
pol id acres date mean st_date end_date avg
<dbl> <dbl> <dbl> <date> <dbl> <date> <date> <dbl>
1 12345 5 124. 2019-05-26 0.222 2019-07-24 2019-09-16 0.289
2 12345 5 124. 2019-06-11 0.652 2019-07-24 2019-09-16 0.289
3 12345 5 124. 2019-06-27 0.856 2019-07-24 2019-09-16 0.289
4 12345 5 124. 2019-07-13 0.154 2019-07-24 2019-09-16 0.289
5 12345 5 124. 2019-07-29 0.425 2019-07-24 2019-09-16 0.289
6 12345 5 124. 2019-09-15 0.152 2019-07-24 2019-09-16 0.289
>
我的代码:
final_pl_date_sel %>%
left_join(df %>% group_by(pol,id,acres) %>% filter(date> st_date & date < end_date) %>% mutate(avg = mean(mean)) %>% select(pol, id, acres, date, avg), by = c('pol' = 'pol','id' = 'id','acres' = 'acres', 'date' = 'date'), keep = F) %>% mutate(avg = replace_na(avg, mean(avg, na.rm = T)))
使用您的代码:
> df %>%
+ left_join(df %>% group_by(pol, id, acres) %>% filter(date> st_date & date < end_date) %>%
+ mutate(avg = mean(mean)) %>% select(pol, id, acres, date, avg), by = c('pol' = 'pol','id' = 'id','acres' = 'acres', 'date' = 'date'), keep = F) %>%
+ mutate(avg = replace_na(avg, mean(avg, na.rm = T)))
# A tibble: 6 x 8
pol id acres date mean st_date end_date avg
<dbl> <dbl> <dbl> <date> <dbl> <date> <date> <dbl>
1 12345 5 124. 2019-05-26 0.222 2019-07-24 2019-09-16 0.289
2 12345 5 124. 2019-06-11 0.652 2019-07-24 2019-09-16 0.289
3 12345 5 124. 2019-06-27 0.856 2019-07-24 2019-09-16 0.289
4 12345 5 124. 2019-07-13 0.154 2019-07-24 2019-09-16 0.289
5 12345 5 124. 2019-07-29 0.425 2019-07-24 2019-09-16 0.289
6 12345 5 124. 2019-09-15 0.152 2019-07-24 2019-09-16 0.289
在左边使用“df”table 因为我没有“final_pl_date_sel”table.
我的 df:
> df
# A tibble: 6 x 7
pol id acres date mean st_date end_date
<dbl> <dbl> <dbl> <date> <dbl> <date> <date>
1 12345 5 124. 2019-05-26 0.222 2019-07-24 2019-09-16
2 12345 5 124. 2019-06-11 0.652 2019-07-24 2019-09-16
3 12345 5 124. 2019-06-27 0.856 2019-07-24 2019-09-16
4 12345 5 124. 2019-07-13 0.154 2019-07-24 2019-09-16
5 12345 5 124. 2019-07-29 0.425 2019-07-24 2019-09-16
6 12345 5 124. 2019-09-15 0.152 2019-07-24 2019-09-16
>