在R中的另一个数据框中选择日期最接近特定日期的行
Selecting row with date nearest to certain date in another data frame in R
我有两个数据框:
- 'df1' 列 - "ID" 和 "EVENT_DATE"
ID EVENT_DATE
<chr> <date>
1 ID001 2016-09-28
2 ID002 2011-03-15
3 ID003 2015-07-20
- 'df2' 列 - "ID"、"X" 和 "X_DATE"
ID X X_DATE
<chr> <dbl> <date>
1 ID001 34.5 2015-04-25
2 ID001 30 2015-08-25
3 ID001 50.5 2016-01-20
4 ID001 33 2016-09-25
5 ID001 22 2016-09-29
6 ID002 22 2010-02-20
7 ID002 45 2011-02-24
8 ID002 44 2012-02-13
9 ID003 22 2015-05-15
10 ID003 34 2015-05-30
11 ID003 34 2015-07-12
12 ID003 43 2015-07-24
我想通过从 'df2'
中获取以下内容,为每个 ID 在 'df1' 中添加 "NEAREST_X_DATE" 和 NEAREST_X"
a) NEAREST_X_DATE = 'X_DATE which is nearest to EVENT_DATE'; NEAREST_X = 'X corresponding to NEAREST_X_DATE'
b) NEAREST_X_DATE = 'X_DATE which is nearest to EVENT_DATE but not later than EVENT_DATE'; NEAREST_X = 'X corresponding to NEAREST_X_DATE'
我应该如何进行此操作?感谢您的帮助。
这是一种可能性:
library(dplyr)
df1$EVENT_DATE <- as.Date(df1$EVENT_DATE)
df2$X_DATE <- as.Date(df2$X_DATE)
# a
df1 %>%
left_join(df2) %>%
mutate(diff = difftime(EVENT_DATE, X_DATE, units = "days")) %>%
group_by(ID) %>%
slice(which.min(abs(diff))) %>%
rename(NEAREST_X = X, NEAREST_X_DATE = X_DATE) %>%
select(-diff)
# b
df1 %>%
left_join(df2) %>%
mutate(diff = difftime(EVENT_DATE, X_DATE, units = "days")) %>%
group_by(ID) %>%
filter(diff >= 0) %>%
slice(which.min(diff)) %>%
rename(NEAREST_X = X, NEAREST_X_DATE = X_DATE) %>%
select(-diff)
我有两个数据框:
- 'df1' 列 - "ID" 和 "EVENT_DATE"
ID EVENT_DATE
<chr> <date>
1 ID001 2016-09-28
2 ID002 2011-03-15
3 ID003 2015-07-20
- 'df2' 列 - "ID"、"X" 和 "X_DATE"
ID X X_DATE
<chr> <dbl> <date>
1 ID001 34.5 2015-04-25
2 ID001 30 2015-08-25
3 ID001 50.5 2016-01-20
4 ID001 33 2016-09-25
5 ID001 22 2016-09-29
6 ID002 22 2010-02-20
7 ID002 45 2011-02-24
8 ID002 44 2012-02-13
9 ID003 22 2015-05-15
10 ID003 34 2015-05-30
11 ID003 34 2015-07-12
12 ID003 43 2015-07-24
我想通过从 'df2'
中获取以下内容,为每个 ID 在 'df1' 中添加 "NEAREST_X_DATE" 和 NEAREST_X"a) NEAREST_X_DATE = 'X_DATE which is nearest to EVENT_DATE'; NEAREST_X = 'X corresponding to NEAREST_X_DATE'
b) NEAREST_X_DATE = 'X_DATE which is nearest to EVENT_DATE but not later than EVENT_DATE'; NEAREST_X = 'X corresponding to NEAREST_X_DATE'
我应该如何进行此操作?感谢您的帮助。
这是一种可能性:
library(dplyr)
df1$EVENT_DATE <- as.Date(df1$EVENT_DATE)
df2$X_DATE <- as.Date(df2$X_DATE)
# a
df1 %>%
left_join(df2) %>%
mutate(diff = difftime(EVENT_DATE, X_DATE, units = "days")) %>%
group_by(ID) %>%
slice(which.min(abs(diff))) %>%
rename(NEAREST_X = X, NEAREST_X_DATE = X_DATE) %>%
select(-diff)
# b
df1 %>%
left_join(df2) %>%
mutate(diff = difftime(EVENT_DATE, X_DATE, units = "days")) %>%
group_by(ID) %>%
filter(diff >= 0) %>%
slice(which.min(diff)) %>%
rename(NEAREST_X = X, NEAREST_X_DATE = X_DATE) %>%
select(-diff)