根据使用 data.table 的条件计算过去一年的行数
Count rows in the past year according to a condition using data.table
根据this previous post,我可以通过以下方式添加一个包含过去一年出现次数的列:
df[, boundary := date - 365]
df[, counts := df[df, .N, on = .(id, date < date, date > boundary), by = .EACHI]$N]
这对我来说很好用。但是,我想通过仅计算另一列具有特定值的出现次数来做到这一点。例如,给定这样的数据集
id type date
ny 0 2021-09-27
ny 0 2021-09-09
ny 1 2021-08-01
ny 1 2021-07-07
ch 0 2020-04-01
ch 1 2020-03-01
ch 0 2020-02-01
我只想计算 type = 1
所在的行数。我怎样才能修改上面的功能来做到这一点?我尝试过类似的方法,但它不起作用:
df[, counts := df[df, .N(type = 1), on = .(id, date < date, date > boundary), by = .EACHI]$N]
编辑:
上述数据集的预期输出为:
id type date counts
ny 0 2021-09-27 2
ny 0 2021-09-09 2
ny 1 2021-08-01 1
ny 1 2021-07-07 0
ch 0 2020-04-01 1
ch 1 2020-03-01 0
ch 0 2020-02-01 0
您可以计算 sum(type == 1)
而不是 .N
。
setDT(df)
df[, boundary := date - 365]
df[, counts := df[df, sum(type == 1),
on = .(id, date < date, date > boundary), by = .EACHI]$V1]
df[is.na(counts), counts := 0]
df
# id type date boundary counts
#1: ny 0 2021-09-27 2020-09-27 2
#2: ny 0 2021-09-09 2020-09-09 2
#3: ny 1 2021-08-01 2020-08-01 1
#4: ny 1 2021-07-07 2020-07-07 0
#5: ch 0 2020-04-01 2019-04-02 1
#6: ch 1 2020-03-01 2019-03-02 0
#7: ch 0 2020-02-01 2019-02-01 0
根据this previous post,我可以通过以下方式添加一个包含过去一年出现次数的列:
df[, boundary := date - 365]
df[, counts := df[df, .N, on = .(id, date < date, date > boundary), by = .EACHI]$N]
这对我来说很好用。但是,我想通过仅计算另一列具有特定值的出现次数来做到这一点。例如,给定这样的数据集
id type date
ny 0 2021-09-27
ny 0 2021-09-09
ny 1 2021-08-01
ny 1 2021-07-07
ch 0 2020-04-01
ch 1 2020-03-01
ch 0 2020-02-01
我只想计算 type = 1
所在的行数。我怎样才能修改上面的功能来做到这一点?我尝试过类似的方法,但它不起作用:
df[, counts := df[df, .N(type = 1), on = .(id, date < date, date > boundary), by = .EACHI]$N]
编辑: 上述数据集的预期输出为:
id type date counts
ny 0 2021-09-27 2
ny 0 2021-09-09 2
ny 1 2021-08-01 1
ny 1 2021-07-07 0
ch 0 2020-04-01 1
ch 1 2020-03-01 0
ch 0 2020-02-01 0
您可以计算 sum(type == 1)
而不是 .N
。
setDT(df)
df[, boundary := date - 365]
df[, counts := df[df, sum(type == 1),
on = .(id, date < date, date > boundary), by = .EACHI]$V1]
df[is.na(counts), counts := 0]
df
# id type date boundary counts
#1: ny 0 2021-09-27 2020-09-27 2
#2: ny 0 2021-09-09 2020-09-09 2
#3: ny 1 2021-08-01 2020-08-01 1
#4: ny 1 2021-07-07 2020-07-07 0
#5: ch 0 2020-04-01 2019-04-02 1
#6: ch 1 2020-03-01 2019-03-02 0
#7: ch 0 2020-02-01 2019-02-01 0