使用 hour/min 和 seq 创建标签以创建 bin

Question

我有一些关于 hour/min 的数据。 hour_min 上的数字只是 as.numeric。 hour_min 是一个 hms 对象。

library(dplyr)
library(lubridate)

df <-  structure(list(hour_min = structure(c(NA, 69300, 46800, 35100, 
52200, 37800, 52200, NA, 45300, 42300, NA, 29700, 46800, 34200, 
32400, 43200, 36000, 41400, 29700, 36000), units = "secs", class = c("hms", 
"difftime")), number = c(NA, 69300, 46800, 35100, 52200, 37800, 
52200, NA, 45300, 42300, NA, 29700, 46800, 34200, 32400, 43200, 
36000, 41400, 29700, 36000)), class = "data.frame", row.names = c(NA, 
-20L), .Names = c("hour_min", "number"))

 hour_min number
1        NA     NA
2  19:15:00  69300
3  13:00:00  46800
4  09:45:00  35100
5  14:30:00  52200
6  10:30:00  37800
7  14:30:00  52200
8        NA     NA
9  12:35:00  45300
10 11:45:00  42300
11       NA     NA
12 08:15:00  29700
13 13:00:00  46800
14 09:30:00  34200
15 09:00:00  32400
16 12:00:00  43200
17 10:00:00  36000
18 11:30:00  41400
19 08:15:00  29700
20 10:00:00  36000

我想创建 30 分钟的间隔，所以我使用以下内容：如果我不使用 labels，那么它似乎可以工作...我怎样才能得到漂亮的标签。

df$interval <- cut(df$number,
                          breaks = seq(as.numeric(hms::as.hms("07:00:00")), 
                                       as.numeric(hms::as.hms("23:00:00")), 1800),
                          labels = as.character(seq(hms::as.hms("07:00:00"), 
                                       hms::as.hms("23:00:00"), 1800)))

所以没有标签我可以得到这个：我想做一个计数，但需要 30 分钟。间隔。

df %>% 
  count(interval)

# A tibble: 11 x 2
   interval                n
   <fct>               <int>
 1 (2.88e+04,3.06e+04]     2
 2 (3.06e+04,3.24e+04]     1
 3 (3.24e+04,3.42e+04]     1
 4 (3.42e+04,3.6e+04]      3
 5 (3.6e+04,3.78e+04]      1
 6 (3.96e+04,4.14e+04]     1
 7 (4.14e+04,4.32e+04]     2
 8 (4.5e+04,4.68e+04]      3
 9 (5.04e+04,5.22e+04]     2
10 (6.84e+04,7.02e+04]     1
11 <NA>                    3

但我需要标签..解决方案？

Answer 1

好吧，这是我自己的解决方案：我需要在标签后使用 hms:as.hms：

df$interval <- cut(df$number,
                          breaks = seq(as.numeric(hms::as.hms("07:00:00")), 
                                       as.numeric(hms::as.hms("23:00:00")), 1800),
                          labels = hms::as.hms( seq(as.numeric(hms::as.hms("07:00:00")), 
                                       as.numeric(hms::as.hms("22:30:00")), 1800))                      )

Answer 2

将 number 转换为 chron times class 给出一个 times 列，因为 times 对象是以一天的分数来衡量的。在那种情况下，我们可以使用 trunc.times 然后 count.

library(chron)
library(dplyr)
library(lubridate)
library(tidyr)

df %>% 
  mutate(times = (number / (24 * 60 * 60)) %>% times %>% trunc("00:30:00")) %>%
  drop_na %>%
  count(times)

给予：

# A tibble: 11 x 2
   times           n
   <S3: times> <int>
 1 08:00:00        2
 2 09:00:00        1
 3 09:30:00        2
 4 10:00:00        2
 5 10:30:00        1
 6 11:30:00        2
 7 12:00:00        1
 8 12:30:00        1
 9 13:00:00        2
10 14:30:00        2
11 19:00:00        1

仅计时

请注意，也可以仅使用 chron 将其编写如下：

library(chron)

tt <- trunc(times(df$number / (24 * 60 * 60)), "00:30:00")
table(tt)

给予：

08:00:00 09:00:00 09:30:00 10:00:00 10:30:00 11:30:00 12:00:00 12:30:00 
       2        1        2        2        1        2        1        1 
13:00:00 14:30:00 19:00:00 
       2        2        1

或使用 aggregate 而不是 table:

aggregate(list(n = tt), list(times = tt), length)

给予：

      times n
1  08:00:00 2
2  09:00:00 1
3  09:30:00 2
4  10:00:00 2
5  10:30:00 1
6  11:30:00 2
7  12:00:00 1
8  12:30:00 1
9  13:00:00 2
10 14:30:00 2
11 19:00:00 1

使用 hour/min 和 seq 创建标签以创建 bin

Creating labels with hour/min and seq to create bins

r

lubridate

dplyr

仅计时