如果日期在第二个 table return 值中的两个日期之间

If date between two dates in second table return value

我正在尝试确定日期时间是否在开始日期时间和结束日期时间之间,如果它是 return 匹配此值的值。它在 data.table 中工作,但想让它在 DPLYR 中工作。

所以如果你有日期时间:

2017-07-01 02:15:00 
2017-07-01 02:30:00

马上查看这些 table

begin,      end,                           value1,  value2
2017-07-01 00:01:00,  2017-07-01 01:00:00,  1,       2
2017-07-01 01:01:00,  2017-07-01 02:00:00,  3,       4
2017-07-01 02:01:00,  2017-07-01 03:00:00,  5,       6

return

date                value1   value2
2017-07-01 02:15:00    5        6     
2017-07-01 02:30:00    5        6  

有很多查找值,因此需要几百次查找日期。

我有这个与 data.table 一起工作,但想使用 DPLYR 来减少对许多包的依赖。这是我目前所拥有的:

library(tidyverse)
library(lubridate)
library(data.table)

dates <- read_csv("date1.csv") %>% 
  mutate(date = as_datetime(date))

lookup <- read_csv("lookup.csv") %>% 
  mutate(begin = as_datetime(begin),
         end = as_datetime(end))

dates <- data.table(dates)
lookup <- data.table(lookup)
setkey(lookup, begin, end)
dates[, c("begin", "end") := date]  
test.df <- foverlaps(dates, lookup)[, c("date", "value1", "value2"), 
                                        with = FALSE] 

我正在考虑使用类似的东西:

test <- dates %>% rowwise() %>%
  mutate(value1 = ifelse( lookup$begin >= date & date <= lookup$end, lookup$value1, ""))

这是要查找的日期:

    dates <- structure(list(date = structure(c(1498867200, 1498868100, 1498869000, 
1498869900, 1498870800, 1498871700, 1498872600, 1498873500, 1498874400, 
1498875300, 1498876200, 1498877100, 1498878000, 1498878900, 1498879800, 
1498880700, 1498881600, 1498882500), tzone = "UTC", class = c("POSIXct", 
"POSIXt"))), .Names = "date", class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -18L))

查找table:

    lookup <- structure(list(begin = structure(c(1498867260, 1498870860, 1498874460, 
1498878060, 1498881660, 1498885260, 1498888860, 1498892460, 1498896060
), class = c("POSIXct", "POSIXt"), tzone = "UTC"), end = structure(c(1498870800, 
1498874400, 1498878000, 1498881600, 1498885200, 1498888800, 1498892400, 
1498896000, 1498899600), class = c("POSIXct", "POSIXt"), tzone = "UTC"), 
    value1 = c(1L, 3L, 5L, 7L, 9L, 11L, 13L, 15L, 17L), value2 = c(2L, 
    4L, 6L, 8L, 10L, 12L, 14L, 16L, 18L)), .Names = c("begin", 
"end", "value1", "value2"), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -9L))

您可以尝试以下方法:

library(tidyverse)
library(lubridate)

dates <- dates %>% 
  mutate(match_date = format(date, "%Y-%m-%d"), 
         match_hour = hour(date - minutes(1)))

lookup <- lookup %>% 
  mutate(match_date = format(begin, "%Y-%m-%d"), 
         match_hour = hour(begin))


left_join(dates, lookup, by = c("match_date", "match_hour")) %>% 
  filter(date >= begin & date <= end) %>% 
  select(- match_date, - match_hour) %>% 
  head()

# A tibble: 6 x 5
#                  date               begin                 end value1 value2
#                <dttm>              <dttm>              <dttm>  <int>  <int>
# 1 2017-07-01 00:15:00 2017-07-01 00:01:00 2017-07-01 01:00:00      1      2
# 2 2017-07-01 00:30:00 2017-07-01 00:01:00 2017-07-01 01:00:00      1      2
# 3 2017-07-01 00:45:00 2017-07-01 00:01:00 2017-07-01 01:00:00      1      2
# 4 2017-07-01 01:00:00 2017-07-01 00:01:00 2017-07-01 01:00:00      1      2
# 5 2017-07-01 01:15:00 2017-07-01 01:01:00 2017-07-01 02:00:00      3      4
# 6 2017-07-01 01:30:00 2017-07-01 01:01:00 2017-07-01 02:00:00      3      4

首先,我提取要匹配的日期和时间。我从 dates-table 中的日期减去一分钟,因为 lookup-table 中的结束时间包含时间尖锐(我的意思是例如 01:00:00 ).因为我想加入开始日期以获得正确的匹配时间(例如在这种情况下为 0),所以我减去分钟。

然后我根据您想要的标准执行 left_join dateslookupfilter