如何按 x 分钟的时间间隔计算属于一个特定列的唯一标签的数量?
How to count the number of unique labels belonging to one particular column with respect to timestamp by interval of x minutes?
我的数据集是这样的:
让我解释一下我的数据框。我有两列名为“timeStamp”和“label”。“label”列包含唯一“timeStamp”列出现的值数。
我能够使用 R 中可用的聚合和计数函数在整个时间内找到 label 列中唯一变量的出现次数。
但现在我想计算 label 列中唯一变量相对于 timestamp 的出现次数,间隔为 2分钟。
准确地说,这就是我在输出中寻找的内容:
您可以在此处使用 R 中的 dput 找到数据框。
x <- data.frame(timeStamp = c("20:12:14","20:12:14","20:13:02","20:13:02","20:13:55","20:13:55","20:14:14","20:14:14","20:14:25","20:14:26","20:14:26","20:14:26","20:15:26","20:15:28","20:15:36","20:15:37","20:16:41","20:16:49","20:17:20","20:17:21"), label = c("003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login"
))
dput(x)
这是一个tidyverse
解决方案:
# Create 2 min breakpoints by which we group times
hm <- function(x) as.POSIXct(x, format = "%H:%M")
breaks <- seq(min(hm(x$timeStamp)), max(hm(x$timeStamp)) + 120, by = '2 min');
library(tidyverse);
x %>%
mutate(
timeStamp = cut(hm(timeStamp), breaks = breaks)) %>%
count(timeStamp, label) %>%
spread(label, n)
## A tibble: 3 x 4
# timeStamp `001_T09_Submit Pa… `002_T05_SearchPat… `003_T04_Ward Lo…
# <fct> <int> <int> <int>
#1 2018-04-13 20:12:00 NA 2 4
#2 2018-04-13 20:14:00 4 4 2
#3 2018-04-13 20:16:00 NA 2 2
说明:我们创建了 2 分钟断点,据此我们 cut
timeStamp
的小时+分钟部分;然后 count
通过 2 min-grouped 时间和标签,并从长到宽传播。
示例数据
x <- data.frame(
timeStamp = c("20:12:14","20:12:14","20:13:02","20:13:02","20:13:55","20:13:55","20:14:14","20:14:14","20:14:25","20:14:26","20:14:26","20:14:26","20:15:26","20:15:28","20:15:36","20:15:37","20:16:41","20:16:49","20:17:20","20:17:21"),
label = c("003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login" ))
我的数据集是这样的:
让我解释一下我的数据框。我有两列名为“timeStamp”和“label”。“label”列包含唯一“timeStamp”列出现的值数。
我能够使用 R 中可用的聚合和计数函数在整个时间内找到 label 列中唯一变量的出现次数。
但现在我想计算 label 列中唯一变量相对于 timestamp 的出现次数,间隔为 2分钟。
准确地说,这就是我在输出中寻找的内容:
您可以在此处使用 R 中的 dput 找到数据框。
x <- data.frame(timeStamp = c("20:12:14","20:12:14","20:13:02","20:13:02","20:13:55","20:13:55","20:14:14","20:14:14","20:14:25","20:14:26","20:14:26","20:14:26","20:15:26","20:15:28","20:15:36","20:15:37","20:16:41","20:16:49","20:17:20","20:17:21"), label = c("003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login"
))
dput(x)
这是一个tidyverse
解决方案:
# Create 2 min breakpoints by which we group times
hm <- function(x) as.POSIXct(x, format = "%H:%M")
breaks <- seq(min(hm(x$timeStamp)), max(hm(x$timeStamp)) + 120, by = '2 min');
library(tidyverse);
x %>%
mutate(
timeStamp = cut(hm(timeStamp), breaks = breaks)) %>%
count(timeStamp, label) %>%
spread(label, n)
## A tibble: 3 x 4
# timeStamp `001_T09_Submit Pa… `002_T05_SearchPat… `003_T04_Ward Lo…
# <fct> <int> <int> <int>
#1 2018-04-13 20:12:00 NA 2 4
#2 2018-04-13 20:14:00 4 4 2
#3 2018-04-13 20:16:00 NA 2 2
说明:我们创建了 2 分钟断点,据此我们 cut
timeStamp
的小时+分钟部分;然后 count
通过 2 min-grouped 时间和标签,并从长到宽传播。
示例数据
x <- data.frame(
timeStamp = c("20:12:14","20:12:14","20:13:02","20:13:02","20:13:55","20:13:55","20:14:14","20:14:14","20:14:25","20:14:26","20:14:26","20:14:26","20:15:26","20:15:28","20:15:36","20:15:37","20:16:41","20:16:49","20:17:20","20:17:21"),
label = c("003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","001_T09_Submit Payment","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login","002_T05_SearchPatient","002_T05_SearchPatient","003_T04_Ward Login","003_T04_Ward Login" ))