Bucketing 对 r 中的值进行排序
Bucketing ranks values in r
我正在尝试循环存储排名值。从rank 1到rank 2是cycle1,同理从rank 2到rank 3是cycle2,以此类推,为每个cycle创建二进制值(如下图)
之前的数据帧
id event date rank
1241a21ef one 2016-08-13 20:03:37 1
1241a21ef two 2016-08-15 05:41:09 2
12426203b two 2016-08-04 05:35:10 1
12426203b three 2016-08-06 02:07:41 2
12426203b two 2016-08-10 05:42:33 3
12426203b three 2016-08-14 02:43:16 4
之后的数据帧
id cycle1 cycle2 cycle3
1241a21ef 1 0 0
12426203b 1 1 1
注意:每个组(即 id)都有基于时间戳的唯一排名值,下一个新 id 的排名将重置为 1
您可以将 dplyr::count
与 tidyr::spread
一起使用,以获取按所需格式制成表格的数据,如:
library(dplyr)
library(tidyr)
df %>% group_by(id) %>%
arrange(id, rank) %>%
filter(rank != last(rank)) %>% #drop last rank for each id
mutate(cycle = paste0("cycle", rank)) %>% #desired column names after spread
group_by(id, cycle) %>%
count() %>%
spread(key = cycle, value = n, fill = 0) %>%
as.data.frame()
# id cycle1 cycle2 cycle3
# 1 1241a21ef 1 0 0
# 2 12426203b 1 1 1
数据:
df <- read.table(text =
"id event date rank
1241a21ef one '2016-08-13 20:03:37' 1
1241a21ef two '2016-08-15 05:41:09' 2
12426203b two '2016-08-04 05:35:10' 1
12426203b three '2016-08-06 02:07:41' 2
12426203b two '2016-08-10 05:42:33' 3
12426203b three '2016-08-14 02:43:16' 4",
header = TRUE, stringsAsFactors = FALSE)
我正在尝试循环存储排名值。从rank 1到rank 2是cycle1,同理从rank 2到rank 3是cycle2,以此类推,为每个cycle创建二进制值(如下图)
之前的数据帧
id event date rank
1241a21ef one 2016-08-13 20:03:37 1
1241a21ef two 2016-08-15 05:41:09 2
12426203b two 2016-08-04 05:35:10 1
12426203b three 2016-08-06 02:07:41 2
12426203b two 2016-08-10 05:42:33 3
12426203b three 2016-08-14 02:43:16 4
之后的数据帧
id cycle1 cycle2 cycle3
1241a21ef 1 0 0
12426203b 1 1 1
注意:每个组(即 id)都有基于时间戳的唯一排名值,下一个新 id 的排名将重置为 1
您可以将 dplyr::count
与 tidyr::spread
一起使用,以获取按所需格式制成表格的数据,如:
library(dplyr)
library(tidyr)
df %>% group_by(id) %>%
arrange(id, rank) %>%
filter(rank != last(rank)) %>% #drop last rank for each id
mutate(cycle = paste0("cycle", rank)) %>% #desired column names after spread
group_by(id, cycle) %>%
count() %>%
spread(key = cycle, value = n, fill = 0) %>%
as.data.frame()
# id cycle1 cycle2 cycle3
# 1 1241a21ef 1 0 0
# 2 12426203b 1 1 1
数据:
df <- read.table(text =
"id event date rank
1241a21ef one '2016-08-13 20:03:37' 1
1241a21ef two '2016-08-15 05:41:09' 2
12426203b two '2016-08-04 05:35:10' 1
12426203b three '2016-08-06 02:07:41' 2
12426203b two '2016-08-10 05:42:33' 3
12426203b three '2016-08-14 02:43:16' 4",
header = TRUE, stringsAsFactors = FALSE)