R data.table,使用 .SDcols 时如何获得完整的 dt
R data.table, how to get complete dt, when using .SDcols
考虑以下 dt:
dt <- data.table(id=c(rep(1,11),rep(2,10)),col1=c(100:80),col2=c(95:75),col3=c(2,100:81))
changeCols <- c("col1","col2","col3")
newCols <- paste0(changeCols, "_diff")
dt[, (newCols) := .SD - shift(.SD), by=id, .SDcols=changeCols][]
现在,我需要检测几列中第二个位置的“跳跃”(在这种情况下删除第一个位置),就像 col3_diff[= 中的情况一样24=],我尝试通过以下方式做到这一点:
dt[, if (.SD[2]>0) .SD[-1] else .SD, by=id, .SDcols=newCols]
我只得到 data.table 的子集,即我丢失了 col1、col2、col3,但我需要完整的 dt。
知道怎么做吗?
提前致谢!
您可以使用以下-
library(data.table)
if(any(dt[2, ..newCols] > 0)) dt <- dt[-2]
library(data.table)
dt[, .SD[rowSums(!sapply(.SD[, newCols, with=FALSE],
function(z) z[2] < 0 | seq_along(z) != 1L)) == 0,],
by = id]
# id col1 col2 col3 col1_diff col2_diff col3_diff
# <num> <int> <int> <num> <int> <int> <num>
# 1: 1 99 94 100 -1 -1 98
# 2: 1 98 93 99 -1 -1 -1
# 3: 1 97 92 98 -1 -1 -1
# 4: 1 96 91 97 -1 -1 -1
# 5: 1 95 90 96 -1 -1 -1
# 6: 1 94 89 95 -1 -1 -1
# 7: 1 93 88 94 -1 -1 -1
# 8: 1 92 87 93 -1 -1 -1
# 9: 1 91 86 92 -1 -1 -1
# 10: 1 90 85 91 -1 -1 -1
# 11: 2 89 84 90 NA NA NA
# 12: 2 88 83 89 -1 -1 -1
# 13: 2 87 82 88 -1 -1 -1
# 14: 2 86 81 87 -1 -1 -1
# 15: 2 85 80 86 -1 -1 -1
# 16: 2 84 79 85 -1 -1 -1
# 17: 2 83 78 84 -1 -1 -1
# 18: 2 82 77 83 -1 -1 -1
# 19: 2 81 76 82 -1 -1 -1
# 20: 2 80 75 81 -1 -1 -1
# id col1 col2 col3 col1_diff col2_diff col3_diff
考虑以下 dt:
dt <- data.table(id=c(rep(1,11),rep(2,10)),col1=c(100:80),col2=c(95:75),col3=c(2,100:81))
changeCols <- c("col1","col2","col3")
newCols <- paste0(changeCols, "_diff")
dt[, (newCols) := .SD - shift(.SD), by=id, .SDcols=changeCols][]
现在,我需要检测几列中第二个位置的“跳跃”(在这种情况下删除第一个位置),就像 col3_diff[= 中的情况一样24=],我尝试通过以下方式做到这一点:
dt[, if (.SD[2]>0) .SD[-1] else .SD, by=id, .SDcols=newCols]
我只得到 data.table 的子集,即我丢失了 col1、col2、col3,但我需要完整的 dt。
知道怎么做吗?
提前致谢!
您可以使用以下-
library(data.table)
if(any(dt[2, ..newCols] > 0)) dt <- dt[-2]
library(data.table)
dt[, .SD[rowSums(!sapply(.SD[, newCols, with=FALSE],
function(z) z[2] < 0 | seq_along(z) != 1L)) == 0,],
by = id]
# id col1 col2 col3 col1_diff col2_diff col3_diff
# <num> <int> <int> <num> <int> <int> <num>
# 1: 1 99 94 100 -1 -1 98
# 2: 1 98 93 99 -1 -1 -1
# 3: 1 97 92 98 -1 -1 -1
# 4: 1 96 91 97 -1 -1 -1
# 5: 1 95 90 96 -1 -1 -1
# 6: 1 94 89 95 -1 -1 -1
# 7: 1 93 88 94 -1 -1 -1
# 8: 1 92 87 93 -1 -1 -1
# 9: 1 91 86 92 -1 -1 -1
# 10: 1 90 85 91 -1 -1 -1
# 11: 2 89 84 90 NA NA NA
# 12: 2 88 83 89 -1 -1 -1
# 13: 2 87 82 88 -1 -1 -1
# 14: 2 86 81 87 -1 -1 -1
# 15: 2 85 80 86 -1 -1 -1
# 16: 2 84 79 85 -1 -1 -1
# 17: 2 83 78 84 -1 -1 -1
# 18: 2 82 77 83 -1 -1 -1
# 19: 2 81 76 82 -1 -1 -1
# 20: 2 80 75 81 -1 -1 -1
# id col1 col2 col3 col1_diff col2_diff col3_diff