Return 百分比剩余在组内,使用组计数的第一个日期
Return percent remaining within a group, using the 1st date of group's count
不是最好的标题...但我正在尝试使用每周数据导出来跟踪未完成的任务。我想使用截至每年最短日期(例如 1 月 1 日)的总任务数作为该年每周剩余 return 百分比(比例)的变量。参见示例:
dt <- data.table(id.date=as.Date(c(rep('2019-01-01',80),rep('2019-01-08',60),rep('2019-01-15',40),
rep('2019-01-22',40),rep('2020-01-01',70),rep('2020-01-08',50),
rep('2020-01-15',40),rep('2020-01-29',20))),
task.type=rep('taskA'))[order(id.date)][,year:=year(id.date)]
# 2 sample records per weekly report
dt[,.SD[sample(.N,min(2,.N))],by=id.date]
id.date task.type year
<date> <chr> <int>
2019-01-01 taskA 2019
2019-01-01 taskA 2019
2019-01-08 taskA 2019
2019-01-08 taskA 2019
2019-01-15 taskA 2019
2019-01-15 taskA 2019
2019-01-22 taskA 2019
2019-01-22 taskA 2019
2020-01-01 taskA 2020
2020-01-01 taskA 2020
2020-01-08 taskA 2020
2020-01-08 taskA 2020
2020-01-15 taskA 2020
2020-01-15 taskA 2020
2020-01-29 taskA 2020
2020-01-29 taskA 2020
所需的输出:
# for weekly reports in 2019, 80 tasks would be used for all reports in 2019
# and for 2020, 70 would be used.
id.date N pct.rem
<date> <int> <dbl>
2019-01-01 80 1.0000000
2019-01-08 60 0.7500000
2019-01-15 40 0.5000000
2019-01-22 40 0.5000000
2020-01-01 70 1.0000000
2020-01-08 50 0.7142857
2020-01-15 40 0.5714286
2020-01-29 20 0.2857143
我已经尝试修改此 SO 讨论 here 中的一些答案。但没有成功。我的猜测是我需要使用 .SD 或 .EACHI 是某种方式,但我只是从 data.table.
开始
非常感谢此处的任何指导。让我知道更清晰是否有帮助。谢谢
好像
dt[, N := .N, by = id.date]
dt[, pct.rem := N/first(N), by = year]
# dt[, pct.rem := N/.SD[1,N], by = year]
# .SD[1,N] by year gives the total task's number of each year
dt[,.SD[1,.(N,pct.rem)], by = id.date]
如果不想改dt
,也可以用copy(dt)
代替。
输出:
id.date N pct.rem
<Date> <int> <num>
1: 2019-01-01 80 1.0000000
2: 2019-01-08 60 0.7500000
3: 2019-01-15 40 0.5000000
4: 2019-01-22 40 0.5000000
5: 2020-01-01 70 1.0000000
6: 2020-01-08 50 0.7142857
7: 2020-01-15 40 0.5714286
8: 2020-01-29 20 0.2857143
你也可以这样做
out <- dt[, .(.N), by = id.date][, pct.rem := N / N[[1L]], by = year(id.date)]
输出
> out[]
id.date N pct.rem
1: 2019-01-01 80 1.0000000
2: 2019-01-08 60 0.7500000
3: 2019-01-15 40 0.5000000
4: 2019-01-22 40 0.5000000
5: 2020-01-01 70 1.0000000
6: 2020-01-08 50 0.7142857
7: 2020-01-15 40 0.5714286
8: 2020-01-29 20 0.2857143
如果每个 id.date
有多种类型的任务,那么试试这个
out <- dt[, .(.N), by = .(id.date, task.type)][, pct.rem := N / N[[1L]], by = .(year(id.date), task.type)]
输出
> out[]
id.date task.type N pct.rem
1: 2019-01-01 taskA 80 1.0000000
2: 2019-01-08 taskA 60 0.7500000
3: 2019-01-15 taskA 40 0.5000000
4: 2019-01-22 taskA 40 0.5000000
5: 2020-01-01 taskA 70 1.0000000
6: 2020-01-08 taskA 50 0.7142857
7: 2020-01-15 taskA 40 0.5714286
8: 2020-01-29 taskA 20 0.2857143
不是最好的标题...但我正在尝试使用每周数据导出来跟踪未完成的任务。我想使用截至每年最短日期(例如 1 月 1 日)的总任务数作为该年每周剩余 return 百分比(比例)的变量。参见示例:
dt <- data.table(id.date=as.Date(c(rep('2019-01-01',80),rep('2019-01-08',60),rep('2019-01-15',40),
rep('2019-01-22',40),rep('2020-01-01',70),rep('2020-01-08',50),
rep('2020-01-15',40),rep('2020-01-29',20))),
task.type=rep('taskA'))[order(id.date)][,year:=year(id.date)]
# 2 sample records per weekly report
dt[,.SD[sample(.N,min(2,.N))],by=id.date]
id.date task.type year
<date> <chr> <int>
2019-01-01 taskA 2019
2019-01-01 taskA 2019
2019-01-08 taskA 2019
2019-01-08 taskA 2019
2019-01-15 taskA 2019
2019-01-15 taskA 2019
2019-01-22 taskA 2019
2019-01-22 taskA 2019
2020-01-01 taskA 2020
2020-01-01 taskA 2020
2020-01-08 taskA 2020
2020-01-08 taskA 2020
2020-01-15 taskA 2020
2020-01-15 taskA 2020
2020-01-29 taskA 2020
2020-01-29 taskA 2020
所需的输出:
# for weekly reports in 2019, 80 tasks would be used for all reports in 2019
# and for 2020, 70 would be used.
id.date N pct.rem
<date> <int> <dbl>
2019-01-01 80 1.0000000
2019-01-08 60 0.7500000
2019-01-15 40 0.5000000
2019-01-22 40 0.5000000
2020-01-01 70 1.0000000
2020-01-08 50 0.7142857
2020-01-15 40 0.5714286
2020-01-29 20 0.2857143
我已经尝试修改此 SO 讨论 here 中的一些答案。但没有成功。我的猜测是我需要使用 .SD 或 .EACHI 是某种方式,但我只是从 data.table.
开始非常感谢此处的任何指导。让我知道更清晰是否有帮助。谢谢
好像
dt[, N := .N, by = id.date]
dt[, pct.rem := N/first(N), by = year]
# dt[, pct.rem := N/.SD[1,N], by = year]
# .SD[1,N] by year gives the total task's number of each year
dt[,.SD[1,.(N,pct.rem)], by = id.date]
如果不想改dt
,也可以用copy(dt)
代替。
输出:
id.date N pct.rem
<Date> <int> <num>
1: 2019-01-01 80 1.0000000
2: 2019-01-08 60 0.7500000
3: 2019-01-15 40 0.5000000
4: 2019-01-22 40 0.5000000
5: 2020-01-01 70 1.0000000
6: 2020-01-08 50 0.7142857
7: 2020-01-15 40 0.5714286
8: 2020-01-29 20 0.2857143
你也可以这样做
out <- dt[, .(.N), by = id.date][, pct.rem := N / N[[1L]], by = year(id.date)]
输出
> out[]
id.date N pct.rem
1: 2019-01-01 80 1.0000000
2: 2019-01-08 60 0.7500000
3: 2019-01-15 40 0.5000000
4: 2019-01-22 40 0.5000000
5: 2020-01-01 70 1.0000000
6: 2020-01-08 50 0.7142857
7: 2020-01-15 40 0.5714286
8: 2020-01-29 20 0.2857143
如果每个 id.date
有多种类型的任务,那么试试这个
out <- dt[, .(.N), by = .(id.date, task.type)][, pct.rem := N / N[[1L]], by = .(year(id.date), task.type)]
输出
> out[]
id.date task.type N pct.rem
1: 2019-01-01 taskA 80 1.0000000
2: 2019-01-08 taskA 60 0.7500000
3: 2019-01-15 taskA 40 0.5000000
4: 2019-01-22 taskA 40 0.5000000
5: 2020-01-01 taskA 70 1.0000000
6: 2020-01-08 taskA 50 0.7142857
7: 2020-01-15 taskA 40 0.5714286
8: 2020-01-29 taskA 20 0.2857143