计算日期之间的实例数
Counting the number of instances between dates
假设我有以下数据集:
library(data.table)
library(lubridate)
store_DT <- data.table(date = seq.Date(from = as.Date("2019-10-01"),
to = as.Date("2019-10-05"),
by = "day"),
store = c(rep("A",5),rep("B",5)))
date store
1: 2019-10-01 A
2: 2019-10-02 A
3: 2019-10-03 A
4: 2019-10-04 A
5: 2019-10-05 A
6: 2019-10-01 B
7: 2019-10-02 B
8: 2019-10-03 B
9: 2019-10-04 B
10: 2019-10-05 B
这只是一个 data.table 商店 x 日期观察。
假设我还有另一个 data.table 的员工开始和结束时间(含):
roster_DT <- data.table(
store = c("A", "A", "A", "A", "B", "B","B", "B"),
employee_ID = 1:8,
start_date = c("2019-09-30", "2019-10-02", "2019-10-03", "2019-10-04",
"2019-09-30", "2019-10-02", "2019-10-03", "2019-10-04"),
end_date = c("2019-10-04", "2019-10-04", "2019-10-05", "2019-10-06",
"2019-10-04", "2019-10-04", "2019-10-05", "2019-10-06")
)
store employee_ID start_date end_date
1: A 1 2019-09-30 2019-10-04
2: A 2 2019-10-02 2019-10-04
3: A 3 2019-10-03 2019-10-05
4: A 4 2019-10-04 2019-10-06
5: B 5 2019-09-30 2019-10-04
6: B 6 2019-10-02 2019-10-04
7: B 7 2019-10-03 2019-10-05
8: B 8 2019-10-04 2019-10-06
我想做的只是计算每个商店在任何给定日期的员工人数,然后将其带回 store_DT
。这里的复杂之处在于 roster_DT
指定了日期范围。现在,一种解决方案是使用建议 here 简单地扩展 roster_DT
。但是实际的数据集比较大,扩容不是efficient/feasible。所以我想知道是否还有其他方法。
我要查找的最终数据集是:
date store employees
1: 2019-10-01 A 1
2: 2019-10-02 A 2
3: 2019-10-03 A 3
4: 2019-10-04 A 4
5: 2019-10-05 A 2
6: 2019-10-01 B 1
7: 2019-10-02 B 2
8: 2019-10-03 B 3
9: 2019-10-04 B 4
10: 2019-10-05 B 2
我的数据集中有很多商店和很多员工,所以我希望有一个 data.table 解决方案。
非常感谢!
请在下面找到使用 lubridate
库和 data.table
库的 foverlaps()
函数的解决方案(reprex)。
Reprex
- 代码
library(data.table)
library(lubridate)
# Convert 'start_date' and 'end_date' columns into class 'date'
sel_cols <- c("start_date", "end_date")
roster_DT[, (sel_cols) := lapply(.SD, ymd), .SDcols = sel_cols]
# Create a dummy variable in the data.table 'store_DT'
store_DT[, dummy := date]
# Set keys for the data.table 'roster_DT'
setkey(roster_DT, start_date, end_date)
# Merge the two data.tables with 'foverlaps()' and summarize the resulting data.table to get the requested data.table (i.e. 'Results')
Results <- foverlaps(store_DT,roster_DT, by.x=c("date", "dummy"), type = "within")[, dummy := NULL][,.(employees = .N/2), by = .(date, store)][]
# Reorder the data.table 'Results' by 'store', then 'date'
setorder(Results, store, date)
-输出
Results
#> date store employees
#> 1: 2019-10-01 A 1
#> 2: 2019-10-02 A 2
#> 3: 2019-10-03 A 3
#> 4: 2019-10-04 A 4
#> 5: 2019-10-05 A 2
#> 6: 2019-10-01 B 1
#> 7: 2019-10-02 B 2
#> 8: 2019-10-03 B 3
#> 9: 2019-10-04 B 4
#> 10: 2019-10-05 B 2
由 reprex package (v2.0.1)
于 2021-11-17 创建
假设我有以下数据集:
library(data.table)
library(lubridate)
store_DT <- data.table(date = seq.Date(from = as.Date("2019-10-01"),
to = as.Date("2019-10-05"),
by = "day"),
store = c(rep("A",5),rep("B",5)))
date store
1: 2019-10-01 A
2: 2019-10-02 A
3: 2019-10-03 A
4: 2019-10-04 A
5: 2019-10-05 A
6: 2019-10-01 B
7: 2019-10-02 B
8: 2019-10-03 B
9: 2019-10-04 B
10: 2019-10-05 B
这只是一个 data.table 商店 x 日期观察。
假设我还有另一个 data.table 的员工开始和结束时间(含):
roster_DT <- data.table(
store = c("A", "A", "A", "A", "B", "B","B", "B"),
employee_ID = 1:8,
start_date = c("2019-09-30", "2019-10-02", "2019-10-03", "2019-10-04",
"2019-09-30", "2019-10-02", "2019-10-03", "2019-10-04"),
end_date = c("2019-10-04", "2019-10-04", "2019-10-05", "2019-10-06",
"2019-10-04", "2019-10-04", "2019-10-05", "2019-10-06")
)
store employee_ID start_date end_date
1: A 1 2019-09-30 2019-10-04
2: A 2 2019-10-02 2019-10-04
3: A 3 2019-10-03 2019-10-05
4: A 4 2019-10-04 2019-10-06
5: B 5 2019-09-30 2019-10-04
6: B 6 2019-10-02 2019-10-04
7: B 7 2019-10-03 2019-10-05
8: B 8 2019-10-04 2019-10-06
我想做的只是计算每个商店在任何给定日期的员工人数,然后将其带回 store_DT
。这里的复杂之处在于 roster_DT
指定了日期范围。现在,一种解决方案是使用建议 here 简单地扩展 roster_DT
。但是实际的数据集比较大,扩容不是efficient/feasible。所以我想知道是否还有其他方法。
我要查找的最终数据集是:
date store employees
1: 2019-10-01 A 1
2: 2019-10-02 A 2
3: 2019-10-03 A 3
4: 2019-10-04 A 4
5: 2019-10-05 A 2
6: 2019-10-01 B 1
7: 2019-10-02 B 2
8: 2019-10-03 B 3
9: 2019-10-04 B 4
10: 2019-10-05 B 2
我的数据集中有很多商店和很多员工,所以我希望有一个 data.table 解决方案。
非常感谢!
请在下面找到使用 lubridate
库和 data.table
库的 foverlaps()
函数的解决方案(reprex)。
Reprex
- 代码
library(data.table)
library(lubridate)
# Convert 'start_date' and 'end_date' columns into class 'date'
sel_cols <- c("start_date", "end_date")
roster_DT[, (sel_cols) := lapply(.SD, ymd), .SDcols = sel_cols]
# Create a dummy variable in the data.table 'store_DT'
store_DT[, dummy := date]
# Set keys for the data.table 'roster_DT'
setkey(roster_DT, start_date, end_date)
# Merge the two data.tables with 'foverlaps()' and summarize the resulting data.table to get the requested data.table (i.e. 'Results')
Results <- foverlaps(store_DT,roster_DT, by.x=c("date", "dummy"), type = "within")[, dummy := NULL][,.(employees = .N/2), by = .(date, store)][]
# Reorder the data.table 'Results' by 'store', then 'date'
setorder(Results, store, date)
-输出
Results
#> date store employees
#> 1: 2019-10-01 A 1
#> 2: 2019-10-02 A 2
#> 3: 2019-10-03 A 3
#> 4: 2019-10-04 A 4
#> 5: 2019-10-05 A 2
#> 6: 2019-10-01 B 1
#> 7: 2019-10-02 B 2
#> 8: 2019-10-03 B 3
#> 9: 2019-10-04 B 4
#> 10: 2019-10-05 B 2
由 reprex package (v2.0.1)
于 2021-11-17 创建