添加满足特定条件的计数器

Add a counter meeting specific conditions

问题陈述

鉴于下面的数据集有两列 Column1Column 2,再添加另外两列,分别称为 CounterCounting time。初始化counterCounter time的条件如下:

  1. 只有在 Column1 > 1Column2 = 0
  2. 中的值才应增加计数器
  3. 计数器必须在条件满足行的 2 个值后开始递增
  4. Counting time必须包含序列发生次数的值(满足条件的数据点序列)

具有预期输出的数据帧

Column1 Column2 Counter Counter_Time  
1.1254  2.784    0        0
4.678   7.985    0        0  
8.89      0      0        1
7.65      0      0        1  
3.54      0      1        1  
4.32      0      2        1  
9.83      0      3        1
3.86     4.3     0        1
5.63     9.8     0        1
4.53      0      0        2
6.83      0      0        2   
3.431     0      4        2
8.976     0      5        2
9.864     0      6        2
7.3      9.2     0        2
2.3      3.2     0        2
4.3       0      0        3
2.1       0      0        3
4.32      0      7        3  

我遇到过类似的问题,得到了如何增加计数器的答案,但我无法满足上述条件。请注意,计数器应在满足条件的两行之后开始。

来自数据集的观察

  1. 第3行满足条件,counter未初始化,Counter_Time自增
  2. Counter从第5行开始(根据条件,满足条件的前2行不应触发计数器)
  3. 第 8 行中的计数器返回 0,Counter_Time 保持不变
  4. 同样,不考虑第 10 行和第 11 行,Counter 从第 12 行开始递增。但是 Counter_time 在第 10 行递增

我把问题陈述的很详细,让高手一目了然,提供准确的解决方案。

# Load packages
library(tidyverse)
library(data.table)

# Create example data frame
dt <- fread("Column1 Column2
1.1254  2.784
4.678   7.985 
8.89      0
7.65      0  
3.54      0
4.32      0  
9.83      0
3.86     4.3
5.63     9.8
4.53      0
6.83      0  
3.431     0
8.976     0
9.864     0
7.3      9.2
2.3      3.2
4.3       0
2.1       0
4.32      0  ")

### Create Counter_Time
dt2 <- dt %>%
  mutate(Merge_ID = 1:n()) %>%
  mutate(Condition = ifelse(Column1 > 1 & Column2 == 0, 1, 0)) %>%
  mutate(ID = rleid(Condition)) %>%
  mutate(Counter_Time = ifelse(Condition == 0, (ID - 1)/2, ID/2))

### Create Counter
dt3 <- dt2 %>%
  group_by(Counter_Time) %>%
  slice(3:n()) %>%
  filter(Condition == 1) %>%
  ungroup() %>%
  mutate(Counter = 1:n()) %>%
  select(Merge_ID, Counter)

### Merge dt2 and dt3 together, dt4 is the final output
dt4 <- dt2 %>%
  left_join(dt3, by = "Merge_ID") %>%
  mutate(Counter = ifelse(is.na(Counter), 0, Counter)) %>%
  select(Column1, Column2, Counter, Counter_Time)

更新

以下代码是dt2创建后的更新。这个想法是为了确保当没有行满足条件时,代码仍然生成 Counter 全部等于 0.

的输出
### Set the index
begin_index <- 3

### Filter the right condition
dt3 <- dt2 %>%
  group_by(Counter_Time) %>%
  slice(begin_index:n()) %>%
  filter(Condition == 1) %>%
  ungroup() 


### Check if dt3 has any rows
if (nrow(dt3) > 0){

  dt3 <- dt3 %>%
    mutate(Counter = 1:n()) %>%
    select(Merge_ID, Counter)

  ### Merge dt2 and dt3 together, dt4 is the final output
  dt4 <- dt2 %>%
    left_join(dt3, by = "Merge_ID") %>%
    mutate(Counter = ifelse(is.na(Counter), 0, Counter)) %>%
    select(Column1, Column2, Counter, Counter_Time)

### If nrow(dt3) is 0, no rows meet the condition
} else {

  ### Create Counter column from dt2
  dt4 <- dt2 %>%
    mutate(Counter = 0) %>%
    select(Column1, Column2, Counter, Counter_Time)

}

data.table 的紧凑解决方案(使用与@ycw 相同的数据):

library(data.table)
dt[, counter := 0
   ][, counter_time := cumsum(c(0,diff(Column1 > 1 & Column2 == 0))==1)
     ][Column1 > 1 & Column2 == 0, counter := c(0,0,rep(1,(.N-2))), by = counter_time
       ][counter == 1, counter := cumsum(counter)]

给出:

> dt
    Column1 Column2 counter counter_time
 1:  1.1254   2.784       0            0
 2:  4.6780   7.985       0            0
 3:  8.8900   0.000       0            1
 4:  7.6500   0.000       0            1
 5:  3.5400   0.000       1            1
 6:  4.3200   0.000       2            1
 7:  9.8300   0.000       3            1
 8:  3.8600   4.300       0            1
 9:  5.6300   9.800       0            1
10:  4.5300   0.000       0            2
11:  6.8300   0.000       0            2
12:  3.4310   0.000       4            2
13:  8.9760   0.000       5            2
14:  9.8640   0.000       6            2
15:  7.3000   9.200       0            2
16:  2.3000   3.200       0            2
17:  4.3000   0.000       0            3
18:  2.1000   0.000       0            3
19:  4.3200   0.000       7            3

使用数据:

library(data.table)
dt <- fread("Column1 Column2
            1.1254  2.784
            4.678   7.985
            8.89      0
            7.65      0
            3.54      0
            4.32      0
            9.83      0
            3.86     4.3
            5.63     9.8
            4.53      0
            6.83      0
            3.431     0
            8.976     0
            9.864     0
            7.3      9.2
            2.3      3.2
            4.3       0
            2.1       0
            4.32      0")