计算R中data.table生命超过i天的个体数
Calculate the number of individual with more than i days of life with data.table in R
这是我的简化版 data.table:
Individual
time_alive (day)
ID1
1
ID2
5
ID3
7
ID4
5
我需要计算每天存活的个体数。
我通过循环
实现了这一点
for (i in c(-1:600)) {
y<-summarise(DT , time_alive > i )
Alive[i+2,]<-length(y[y==TRUE])
}
然而,这真的很长,data.frame 超过 2B 次观察。
我想尝试使用 data.table 的替代方案,但我被困在只有 1 天的存活计算中:
DT[,.N,time_alive> i][time_alive==TRUE,2]
这里,i不能用向量代替,只能用1个数字代替。我想计算生命超过 i 天的个体数量,而不做循环。
我对简化数据的预期结果是:
Day
Number of individual alive
1
4
2
3
3
3
4
3
5
3
6
1
7
1
8
0
一行中的最佳解决方案,data.table比循环快得多:
DT[, .(Day = seq_len(1 + max(time_alive)))][DT[,.(time_alive)], .(.N), on = .(Day <= time_alive), by = Day]
我会以不同的方式解决问题。
如果您 运行 data.frame(Alive = cumsum(rev(table(c(1,5,7,5)))))
(或者在您的一般情况下 data.frame(Alive = cumsum(rev(table(DT$time_alive))))
),您将获得所需的信息,唯一需要注意的是,如果有任何一天没有死亡,你最终会在数据中出现差距。
# @r2evans suggestion about making it a one-liner
# replaced res = data.table('day' = 1:max(DT$time_alive))
DT[, .(day = seq_len(1 + max(time_alive)))][
# my original solution
DT, .(.N) ,on = .(day <= time_alive),by = day, allow.cartesian = T]
# or
DT[,time_alive > TARGET_NUMBER, by = individual]
根据您提供的内容,我有两个解决方案。它们中的一个或两个应该是您要查找的内容。 details/explanation
见下文
# load in data
DT = data.table('individual' = 1:4, 'time_alive' = c(1,5,7,5))
# set your target number
TARGET_NUMBER = 5
# group by individual,
# then check if the number of days they were alive is greater than your target
# this answers "i want to calculate the number of
# individual with more than "i" days of life
DT[,time_alive > TARGET_NUMBER, by = individual]
individual V1
1: 1 FALSE
2: 2 FALSE
3: 3 TRUE
4: 4 FALSE
# if the result you want is that table you created. that is a little different:
# create a table with days ranging from 1 to the maximum survivor
res = data.table('day' = 1:max(DT$time_alive))
day
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
# use joins
# join by time alive being greater than or equal to the day
# group by the specific day, and count how many observations we have
# allow.cartesian because the mapping isn't one-to-one
res[DT, .(.N) ,on = .(day <= time_alive),by = day, allow.cartesian = T]
day N
1: 1 4
2: 2 3
3: 3 3
4: 4 3
5: 5 3
6: 6 1
7: 7 1
data.table
library(data.table)
DT[, .(Day = seq_len(max(time_alive) + 1))
][, Number := rowSums(outer(Day, DT$time_alive, `<=`))]
# Day Number
# <int> <num>
# 1: 1 4
# 2: 2 3
# 3: 3 3
# 4: 4 3
# 5: 5 3
# 6: 6 1
# 7: 7 1
# 8: 8 0
(我假设 DT
每个 Individual
不超过 1 行。)
数据
DT <- setDT(structure(list(Individual = c("ID1", "ID2", "ID3", "ID4"), time_alive = c(1L, 5L, 7L, 5L)), class = c("data.table", "data.frame"), row.names = c(NA, -4L)))
这是我的简化版 data.table:
Individual | time_alive (day) |
---|---|
ID1 | 1 |
ID2 | 5 |
ID3 | 7 |
ID4 | 5 |
我需要计算每天存活的个体数。 我通过循环
实现了这一点for (i in c(-1:600)) {
y<-summarise(DT , time_alive > i )
Alive[i+2,]<-length(y[y==TRUE])
}
然而,这真的很长,data.frame 超过 2B 次观察。
我想尝试使用 data.table 的替代方案,但我被困在只有 1 天的存活计算中:
DT[,.N,time_alive> i][time_alive==TRUE,2]
这里,i不能用向量代替,只能用1个数字代替。我想计算生命超过 i 天的个体数量,而不做循环。
我对简化数据的预期结果是:
Day | Number of individual alive |
---|---|
1 | 4 |
2 | 3 |
3 | 3 |
4 | 3 |
5 | 3 |
6 | 1 |
7 | 1 |
8 | 0 |
一行中的最佳解决方案,data.table比循环快得多:
DT[, .(Day = seq_len(1 + max(time_alive)))][DT[,.(time_alive)], .(.N), on = .(Day <= time_alive), by = Day]
我会以不同的方式解决问题。
如果您 运行 data.frame(Alive = cumsum(rev(table(c(1,5,7,5)))))
(或者在您的一般情况下 data.frame(Alive = cumsum(rev(table(DT$time_alive))))
),您将获得所需的信息,唯一需要注意的是,如果有任何一天没有死亡,你最终会在数据中出现差距。
# @r2evans suggestion about making it a one-liner
# replaced res = data.table('day' = 1:max(DT$time_alive))
DT[, .(day = seq_len(1 + max(time_alive)))][
# my original solution
DT, .(.N) ,on = .(day <= time_alive),by = day, allow.cartesian = T]
# or
DT[,time_alive > TARGET_NUMBER, by = individual]
根据您提供的内容,我有两个解决方案。它们中的一个或两个应该是您要查找的内容。 details/explanation
见下文# load in data
DT = data.table('individual' = 1:4, 'time_alive' = c(1,5,7,5))
# set your target number
TARGET_NUMBER = 5
# group by individual,
# then check if the number of days they were alive is greater than your target
# this answers "i want to calculate the number of
# individual with more than "i" days of life
DT[,time_alive > TARGET_NUMBER, by = individual]
individual V1
1: 1 FALSE
2: 2 FALSE
3: 3 TRUE
4: 4 FALSE
# if the result you want is that table you created. that is a little different:
# create a table with days ranging from 1 to the maximum survivor
res = data.table('day' = 1:max(DT$time_alive))
day
1: 1
2: 2
3: 3
4: 4
5: 5
6: 6
7: 7
# use joins
# join by time alive being greater than or equal to the day
# group by the specific day, and count how many observations we have
# allow.cartesian because the mapping isn't one-to-one
res[DT, .(.N) ,on = .(day <= time_alive),by = day, allow.cartesian = T]
day N
1: 1 4
2: 2 3
3: 3 3
4: 4 3
5: 5 3
6: 6 1
7: 7 1
data.table
library(data.table)
DT[, .(Day = seq_len(max(time_alive) + 1))
][, Number := rowSums(outer(Day, DT$time_alive, `<=`))]
# Day Number
# <int> <num>
# 1: 1 4
# 2: 2 3
# 3: 3 3
# 4: 4 3
# 5: 5 3
# 6: 6 1
# 7: 7 1
# 8: 8 0
(我假设 DT
每个 Individual
不超过 1 行。)
数据
DT <- setDT(structure(list(Individual = c("ID1", "ID2", "ID3", "ID4"), time_alive = c(1L, 5L, 7L, 5L)), class = c("data.table", "data.frame"), row.names = c(NA, -4L)))