R:当另一列中的变量名称更改时,在一列中插入 NA

R: insert NAs in one column when variable name in another column changes

我有一个如下所示的数据框:

library(data.table)
set.seed(1234)
DT<-data.table(x=c("a","a","a","b","b","c","c","c","d","d","d","d"),v=sample(1:4,12,replace = T))

  x v
  a 1
  a 3
  a 3
  b 3
  b 4
  c 3
  c 1
  c 1
  d 3
  d 3
  d 3
  d 3

我需要做的是有条件地替换值"v",每次变量"x"改变时,像这样:

      x v
      a 1
      a 3
      a 3
      b NA
      b 4
      c NA
      c 1
      c 1
      d NA
      d 3
      d 3
      d 3

我是必须做一个循环还是只有一个班轮做同样的事情? 谢谢!

是的,有一个 one-liner:

DT[x != shift(x), v := NA]


    x  v
 1: a  1
 2: a  3
 3: a  3
 4: b NA
 5: b  4
 6: c NA
 7: c  1
 8: c  1
 9: d NA
10: d  3
11: d  3
12: d  3

有关此语法的详细信息,请参阅 ?shiftthe data.table vignettes


或者,为了避免计算 shift 和完整的 != 比较...

DT[DT[, if (.GRP > 1L) .I[1L], by=rleid(x)]$V1, v := NA ]

关注 @eddi's approach to subsetting by group。有关详细信息,请参阅 ?.GRP?.I?rleid