Bucketing 对 r 中的值进行排序

Bucketing ranks values in r

我正在尝试循环存储排名值。从rank 1到rank 2是cycle1,同理从rank 2到rank 3是cycle2,以此类推,为每个cycle创建二进制值(如下图)

之前的数据帧

id               event              date                   rank       
1241a21ef        one             2016-08-13 20:03:37         1
1241a21ef        two             2016-08-15 05:41:09         2
12426203b        two             2016-08-04 05:35:10         1
12426203b       three            2016-08-06 02:07:41         2
12426203b        two             2016-08-10 05:42:33         3
12426203b       three            2016-08-14 02:43:16         4

之后的数据帧
id           cycle1     cycle2   cycle3
1241a21ef      1          0         0
12426203b      1          1         1

注意:每个组(即 id)都有基于时间戳的唯一排名值,下一个新 id 的排名将重置为 1

您可以将 dplyr::counttidyr::spread 一起使用,以获取按所需格式制成表格的数据,如:

library(dplyr)
library(tidyr)

df %>% group_by(id) %>%
  arrange(id, rank) %>%   
  filter(rank != last(rank)) %>%   #drop last rank for each id
  mutate(cycle = paste0("cycle", rank)) %>%  #desired column names after spread
  group_by(id, cycle) %>%
  count() %>%
  spread(key = cycle, value = n, fill = 0) %>%
  as.data.frame() 





#          id cycle1 cycle2 cycle3
# 1 1241a21ef      1      0      0
# 2 12426203b      1      1      1

数据:

df <- read.table(text =
"id               event              date                   rank       
1241a21ef        one             '2016-08-13 20:03:37'         1
1241a21ef        two             '2016-08-15 05:41:09'         2
12426203b        two             '2016-08-04 05:35:10'         1
12426203b       three            '2016-08-06 02:07:41'         2
12426203b        two             '2016-08-10 05:42:33'         3
12426203b       three            '2016-08-14 02:43:16'         4",
header = TRUE, stringsAsFactors = FALSE)