使用 ipredbagg 或 pecCforest 对左截断数据进行生存分析
Survival analysis on left truncated data with ipredbagg or pecCforest
我正在寻找使用树状算法对左截断右截尾数据执行生存分析的方法。我尝试了包 ipred
和 pec
,但是函数 ipredbagg
和 pecCforest
似乎只能在没有左截断的情况下工作。
资料说明
我的数据看起来很像斯坦福心脏移植数据中的心脏数据集。这些对象实际上从 t=0 开始就处于危险之中,但有些对象(对我来说是绝大多数)仅在稍后的时间 t1 才进入调查,因此当它们在 t < t1 时死亡时,它们不会进入数据集。已经表明,当忽略此左截断时,可能会出现巨大的错误,参见例如http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121224/
除了左截断之外,我的数据集也是右感知的,就像心脏数据集一样,所以有些对象离开数据集时没有 died/shown 相关事件。
心脏数据集如下所示
Surv(heart$start, heart$stop, heart$event)
[1] ( 0.0, 50.0] ( 0.0, 6.0] ( 0.0, 1.0+] ( 1.0, 16.0] ( 0.0, 36.0+] ( 36.0, 39.0]
[7] ( 0.0, 18.0] ( 0.0, 3.0] ( 0.0, 51.0+] ( 51.0, 675.0] ( 0.0, 40.0] ( 0.0, 85.0]
[13] ( 0.0, 12.0+] ( 12.0, 58.0] ( 0.0, 26.0+] ( 26.0, 153.0] ( 0.0, 8.0] ( 0.0, 17.0+]
[19] ( 17.0, 81.0] ( 0.0, 37.0+] ( 37.0,1387.0] ( 0.0, 1.0] ( 0.0, 28.0+] ( 28.0, 308.0]
[25] ( 0.0, 36.0] ( 0.0, 20.0+] ( 20.0, 43.0] ( 0.0, 37.0] ( 0.0, 18.0+] ( 18.0, 28.0]
[31] ( 0.0, 8.0+] ( 8.0,1032.0] ( 0.0, 12.0+] ( 12.0, 51.0] ( 0.0, 3.0+] ( 3.0, 733.0]
[37] ( 0.0, 83.0+] ( 83.0, 219.0] ( 0.0, 25.0+] ( 25.0,1800.0+] ( 0.0,1401.0+] ( 0.0, 263.0]
[43] ( 0.0, 71.0+] ( 71.0, 72.0] ( 0.0, 35.0] ( 0.0, 16.0+] ( 16.0, 852.0] ( 0.0, 16.0]
[49] ( 0.0, 17.0+] ( 17.0, 77.0] ( 0.0, 51.0+] ( 51.0,1587.0+] ( 0.0, 23.0+] ( 23.0,1572.0+]
[55] ( 0.0, 12.0] ( 0.0, 46.0+] ( 46.0, 100.0] ( 0.0, 19.0+] ( 19.0, 66.0] ( 0.0, 4.5+]
[61] ( 4.5, 5.0] ( 0.0, 2.0+] ( 2.0, 53.0] ( 0.0, 41.0+] ( 41.0,1408.0+] ( 0.0, 58.0+]
[67] ( 58.0,1322.0+] ( 0.0, 3.0] ( 0.0, 2.0] ( 0.0, 40.0] ( 0.0, 1.0+] ( 1.0, 45.0]
[73] ( 0.0, 2.0+] ( 2.0, 996.0] ( 0.0, 21.0+] ( 21.0, 72.0] ( 0.0, 9.0] ( 0.0, 36.0+]
[79] ( 36.0,1142.0+] ( 0.0, 83.0+] ( 83.0, 980.0] ( 0.0, 32.0+] ( 32.0, 285.0] ( 0.0, 102.0]
[85] ( 0.0, 41.0+] ( 41.0, 188.0] ( 0.0, 3.0] ( 0.0, 10.0+] ( 10.0, 61.0] ( 0.0, 67.0+]
[91] ( 67.0, 942.0+] ( 0.0, 149.0] ( 0.0, 21.0+] ( 21.0, 343.0] ( 0.0, 78.0+] ( 78.0, 916.0+]
[97] ( 0.0, 3.0+] ( 3.0, 68.0] ( 0.0, 2.0] ( 0.0, 69.0] ( 0.0, 27.0+] ( 27.0, 842.0+]
[103] ( 0.0, 33.0+] ( 33.0, 584.0] ( 0.0, 12.0+] ( 12.0, 78.0] ( 0.0, 32.0] ( 0.0, 57.0+]
[109] ( 57.0, 285.0] ( 0.0, 3.0+] ( 3.0, 68.0] ( 0.0, 10.0+] ( 10.0, 670.0+] ( 0.0, 5.0+]
[115] ( 5.0, 30.0] ( 0.0, 31.0+] ( 31.0, 620.0+] ( 0.0, 4.0+] ( 4.0, 596.0+] ( 0.0, 27.0+]
[121] ( 27.0, 90.0] ( 0.0, 5.0+] ( 5.0, 17.0] ( 0.0, 2.0] ( 0.0, 46.0+] ( 46.0, 545.0+]
[127] ( 0.0, 21.0] ( 0.0, 210.0+] (210.0, 515.0+] ( 0.0, 67.0+] ( 67.0, 96.0] ( 0.0, 26.0+]
[133] ( 26.0, 482.0+] ( 0.0, 6.0+] ( 6.0, 445.0+] ( 0.0, 428.0+] ( 0.0, 32.0+] ( 32.0, 80.0]
[139] ( 0.0, 37.0+] ( 37.0, 334.0] ( 0.0, 5.0] ( 0.0, 8.0+] ( 8.0, 397.0+] ( 0.0, 60.0+]
[145] ( 60.0, 110.0] ( 0.0, 31.0+] ( 31.0, 370.0+] ( 0.0, 139.0+] (139.0, 207.0] ( 0.0, 160.0+]
[151] (160.0, 186.0] ( 0.0, 340.0] ( 0.0, 310.0+] (310.0, 340.0+] ( 0.0, 28.0+] ( 28.0, 265.0+]
[157] ( 0.0, 4.0+] ( 4.0, 165.0] ( 0.0, 2.0+] ( 2.0, 16.0] ( 0.0, 13.0+] ( 13.0, 180.0+]
[163] ( 0.0, 21.0+] ( 21.0, 131.0+] ( 0.0, 96.0+] ( 96.0, 109.0+] ( 0.0, 21.0] ( 0.0, 38.0+]
[169] ( 38.0, 39.0+] ( 0.0, 31.0+] ( 0.0, 11.0+] ( 0.0, 6.0]
所以在每个时间间隔中,对象第一次进入我的集合并开始 "at risk"。在对象第二次离开集合时,有些是因为有趣的事件发生(没有“+”),有些是因为被删减(有“+”)。
Cox 回归
对于 Cox 回归,一切正常。上面创建的这个 Surv 对象可用于执行 Cox 回归。
coxtime=coxph(Surv(heart$start, heart$stop, heart$event)~1,data=heart)
summary(coxtime)
Call: coxph(formula = Surv(heart$start, heart$stop, heart$event) ~
1, data = heart)
Null model
log likelihood= -298.1214
n= 172
我也可以绘制生存函数
plot(survfit(coxtime),xscale=365.25, xlab = "Years", ylab="Survival")
Survival function of heart dataset
现在我想用树状算法进行同样的分析。
ipredbagg
当我尝试 ippredbag
函数时,这个函数在没有左截断的情况下工作正常:
library(survival)
library(ipred)
#without left truncation
ipredbagg(Surv(heart$stop, heart$event) ,X=heart$surgery)
我得到了结果
Bagging survival trees with 25 bootstrap replications
因为heart set中有些行的start值为0,所以直接在ipredbagg
函数中输入start和stop就报错了。
#with left truncation
ipredbagg(Surv(heart$start, heart$stop, heart$event) ,X=heart$surgery)
Error in get(paste("rpart", method, sep = "."), envir = environment())(Y, :
Observation time must be > 0
因此我在开始和停止列中都添加了一个,但现在又出现了另一个错误。
#with left truncation and start > 0
ipredbagg(Surv(heart$start+1, heart$stop+1, heart$event) ,X=heart$surgery)
错误:
Error in table(index2, levels = 1:ngrp) :
all arguments must have the same length
pecCforest
我的第二次尝试是 pec
包中的 pecCforest
函数。此函数也适用于没有左截断的数据。
图书馆(PEC)
图书馆(派对)
图书馆(生存)
#without left truncation
fitcforest <- pecCforest(Surv(stop, event) ~ .,data=heart[,-which(names(heart)=="start")],
controls = cforest_classical(ntree=100),mtry=2);
predictSurvProb(fitcforest,heart[1,],times=1)
我得到了结果
[,1]
[1,] 0.9890595
这次我可以在开始和停止列上训练模型而不会出现错误,但我无法从中得到预测。
#with left truncation
fitcforest <- pecCforest(Surv((start), (stop), event) ~ .,data=heart,
controls = cforest_classical(ntree=100));
predictSurvProb(fitcforest,heart[1,],times=1)
这导致
Error in predict.survfit(object, newdata = newdata, times = times, bytimes = TRUE, :
Predictions only available
for class 'survfit', possibly stratified Kaplan-Meier fits.
For class 'cph' Cox models see survest.cph.
将 1 加到两列开始和停止会导致相同的错误。
#with left truncation and start > 0
fitcforest <- pecCforest(Surv((start+1), (stop+1), event) ~ .,data=heart,
controls = cforest_classical(ntree=100,mtry=2));
predictSurvProb(fitcforest,heart[1,],times=1)
Error in predict.survfit(object, newdata = newdata, times = times, bytimes = TRUE, :
Predictions only available
for class 'survfit', possibly stratified Kaplan-Meier fits.
For class 'cph' Cox models see survest.cph.
有没有办法让这些函数对左截断数据起作用?两个函数似乎都没有实现左截断,但我找不到相关信息。是否有另一种方法可以使用基于树的算法对 R 中的左截断数据进行生存分析(我设法进行了标准的 Cox 回归)?
您可以在 CRAN 上试用最新的包 LTRCtrees,它是为左截断生存数据设计的
我正在寻找使用树状算法对左截断右截尾数据执行生存分析的方法。我尝试了包 ipred
和 pec
,但是函数 ipredbagg
和 pecCforest
似乎只能在没有左截断的情况下工作。
资料说明
我的数据看起来很像斯坦福心脏移植数据中的心脏数据集。这些对象实际上从 t=0 开始就处于危险之中,但有些对象(对我来说是绝大多数)仅在稍后的时间 t1 才进入调查,因此当它们在 t < t1 时死亡时,它们不会进入数据集。已经表明,当忽略此左截断时,可能会出现巨大的错误,参见例如http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3121224/ 除了左截断之外,我的数据集也是右感知的,就像心脏数据集一样,所以有些对象离开数据集时没有 died/shown 相关事件。
心脏数据集如下所示
Surv(heart$start, heart$stop, heart$event)
[1] ( 0.0, 50.0] ( 0.0, 6.0] ( 0.0, 1.0+] ( 1.0, 16.0] ( 0.0, 36.0+] ( 36.0, 39.0]
[7] ( 0.0, 18.0] ( 0.0, 3.0] ( 0.0, 51.0+] ( 51.0, 675.0] ( 0.0, 40.0] ( 0.0, 85.0]
[13] ( 0.0, 12.0+] ( 12.0, 58.0] ( 0.0, 26.0+] ( 26.0, 153.0] ( 0.0, 8.0] ( 0.0, 17.0+]
[19] ( 17.0, 81.0] ( 0.0, 37.0+] ( 37.0,1387.0] ( 0.0, 1.0] ( 0.0, 28.0+] ( 28.0, 308.0]
[25] ( 0.0, 36.0] ( 0.0, 20.0+] ( 20.0, 43.0] ( 0.0, 37.0] ( 0.0, 18.0+] ( 18.0, 28.0]
[31] ( 0.0, 8.0+] ( 8.0,1032.0] ( 0.0, 12.0+] ( 12.0, 51.0] ( 0.0, 3.0+] ( 3.0, 733.0]
[37] ( 0.0, 83.0+] ( 83.0, 219.0] ( 0.0, 25.0+] ( 25.0,1800.0+] ( 0.0,1401.0+] ( 0.0, 263.0]
[43] ( 0.0, 71.0+] ( 71.0, 72.0] ( 0.0, 35.0] ( 0.0, 16.0+] ( 16.0, 852.0] ( 0.0, 16.0]
[49] ( 0.0, 17.0+] ( 17.0, 77.0] ( 0.0, 51.0+] ( 51.0,1587.0+] ( 0.0, 23.0+] ( 23.0,1572.0+]
[55] ( 0.0, 12.0] ( 0.0, 46.0+] ( 46.0, 100.0] ( 0.0, 19.0+] ( 19.0, 66.0] ( 0.0, 4.5+]
[61] ( 4.5, 5.0] ( 0.0, 2.0+] ( 2.0, 53.0] ( 0.0, 41.0+] ( 41.0,1408.0+] ( 0.0, 58.0+]
[67] ( 58.0,1322.0+] ( 0.0, 3.0] ( 0.0, 2.0] ( 0.0, 40.0] ( 0.0, 1.0+] ( 1.0, 45.0]
[73] ( 0.0, 2.0+] ( 2.0, 996.0] ( 0.0, 21.0+] ( 21.0, 72.0] ( 0.0, 9.0] ( 0.0, 36.0+]
[79] ( 36.0,1142.0+] ( 0.0, 83.0+] ( 83.0, 980.0] ( 0.0, 32.0+] ( 32.0, 285.0] ( 0.0, 102.0]
[85] ( 0.0, 41.0+] ( 41.0, 188.0] ( 0.0, 3.0] ( 0.0, 10.0+] ( 10.0, 61.0] ( 0.0, 67.0+]
[91] ( 67.0, 942.0+] ( 0.0, 149.0] ( 0.0, 21.0+] ( 21.0, 343.0] ( 0.0, 78.0+] ( 78.0, 916.0+]
[97] ( 0.0, 3.0+] ( 3.0, 68.0] ( 0.0, 2.0] ( 0.0, 69.0] ( 0.0, 27.0+] ( 27.0, 842.0+]
[103] ( 0.0, 33.0+] ( 33.0, 584.0] ( 0.0, 12.0+] ( 12.0, 78.0] ( 0.0, 32.0] ( 0.0, 57.0+]
[109] ( 57.0, 285.0] ( 0.0, 3.0+] ( 3.0, 68.0] ( 0.0, 10.0+] ( 10.0, 670.0+] ( 0.0, 5.0+]
[115] ( 5.0, 30.0] ( 0.0, 31.0+] ( 31.0, 620.0+] ( 0.0, 4.0+] ( 4.0, 596.0+] ( 0.0, 27.0+]
[121] ( 27.0, 90.0] ( 0.0, 5.0+] ( 5.0, 17.0] ( 0.0, 2.0] ( 0.0, 46.0+] ( 46.0, 545.0+]
[127] ( 0.0, 21.0] ( 0.0, 210.0+] (210.0, 515.0+] ( 0.0, 67.0+] ( 67.0, 96.0] ( 0.0, 26.0+]
[133] ( 26.0, 482.0+] ( 0.0, 6.0+] ( 6.0, 445.0+] ( 0.0, 428.0+] ( 0.0, 32.0+] ( 32.0, 80.0]
[139] ( 0.0, 37.0+] ( 37.0, 334.0] ( 0.0, 5.0] ( 0.0, 8.0+] ( 8.0, 397.0+] ( 0.0, 60.0+]
[145] ( 60.0, 110.0] ( 0.0, 31.0+] ( 31.0, 370.0+] ( 0.0, 139.0+] (139.0, 207.0] ( 0.0, 160.0+]
[151] (160.0, 186.0] ( 0.0, 340.0] ( 0.0, 310.0+] (310.0, 340.0+] ( 0.0, 28.0+] ( 28.0, 265.0+]
[157] ( 0.0, 4.0+] ( 4.0, 165.0] ( 0.0, 2.0+] ( 2.0, 16.0] ( 0.0, 13.0+] ( 13.0, 180.0+]
[163] ( 0.0, 21.0+] ( 21.0, 131.0+] ( 0.0, 96.0+] ( 96.0, 109.0+] ( 0.0, 21.0] ( 0.0, 38.0+]
[169] ( 38.0, 39.0+] ( 0.0, 31.0+] ( 0.0, 11.0+] ( 0.0, 6.0]
所以在每个时间间隔中,对象第一次进入我的集合并开始 "at risk"。在对象第二次离开集合时,有些是因为有趣的事件发生(没有“+”),有些是因为被删减(有“+”)。
Cox 回归
对于 Cox 回归,一切正常。上面创建的这个 Surv 对象可用于执行 Cox 回归。
coxtime=coxph(Surv(heart$start, heart$stop, heart$event)~1,data=heart)
summary(coxtime)
Call: coxph(formula = Surv(heart$start, heart$stop, heart$event) ~
1, data = heart)
Null model
log likelihood= -298.1214
n= 172
我也可以绘制生存函数
plot(survfit(coxtime),xscale=365.25, xlab = "Years", ylab="Survival")
Survival function of heart dataset 现在我想用树状算法进行同样的分析。
ipredbagg
当我尝试 ippredbag
函数时,这个函数在没有左截断的情况下工作正常:
library(survival)
library(ipred)
#without left truncation
ipredbagg(Surv(heart$stop, heart$event) ,X=heart$surgery)
我得到了结果
Bagging survival trees with 25 bootstrap replications
因为heart set中有些行的start值为0,所以直接在ipredbagg
函数中输入start和stop就报错了。
#with left truncation
ipredbagg(Surv(heart$start, heart$stop, heart$event) ,X=heart$surgery)
Error in get(paste("rpart", method, sep = "."), envir = environment())(Y, :
Observation time must be > 0
因此我在开始和停止列中都添加了一个,但现在又出现了另一个错误。
#with left truncation and start > 0
ipredbagg(Surv(heart$start+1, heart$stop+1, heart$event) ,X=heart$surgery)
错误:
Error in table(index2, levels = 1:ngrp) :
all arguments must have the same length
pecCforest
我的第二次尝试是 pec
包中的 pecCforest
函数。此函数也适用于没有左截断的数据。
图书馆(PEC)
图书馆(派对)
图书馆(生存)
#without left truncation
fitcforest <- pecCforest(Surv(stop, event) ~ .,data=heart[,-which(names(heart)=="start")],
controls = cforest_classical(ntree=100),mtry=2);
predictSurvProb(fitcforest,heart[1,],times=1)
我得到了结果
[,1]
[1,] 0.9890595
这次我可以在开始和停止列上训练模型而不会出现错误,但我无法从中得到预测。
#with left truncation
fitcforest <- pecCforest(Surv((start), (stop), event) ~ .,data=heart,
controls = cforest_classical(ntree=100));
predictSurvProb(fitcforest,heart[1,],times=1)
这导致
Error in predict.survfit(object, newdata = newdata, times = times, bytimes = TRUE, :
Predictions only available
for class 'survfit', possibly stratified Kaplan-Meier fits.
For class 'cph' Cox models see survest.cph.
将 1 加到两列开始和停止会导致相同的错误。
#with left truncation and start > 0
fitcforest <- pecCforest(Surv((start+1), (stop+1), event) ~ .,data=heart,
controls = cforest_classical(ntree=100,mtry=2));
predictSurvProb(fitcforest,heart[1,],times=1)
Error in predict.survfit(object, newdata = newdata, times = times, bytimes = TRUE, :
Predictions only available
for class 'survfit', possibly stratified Kaplan-Meier fits.
For class 'cph' Cox models see survest.cph.
有没有办法让这些函数对左截断数据起作用?两个函数似乎都没有实现左截断,但我找不到相关信息。是否有另一种方法可以使用基于树的算法对 R 中的左截断数据进行生存分析(我设法进行了标准的 Cox 回归)?
您可以在 CRAN 上试用最新的包 LTRCtrees,它是为左截断生存数据设计的