在R中获取每天的第一个和最后一个值

Getting the first and the Last value of each day in R

我有一个数据框,显示两列每天的在线时间。

我首先将时间与日期分开,使用这个:

a1 <- dmy_hm(df$V2)
d1 <- data.frame(Date= format(a1, '%d/%m/%Y'), Time=format(a1, '%H:%M:%S'))

        Date     Time
31   04/06/2018 17:51:00
32   04/06/2018 17:50:00
33   04/06/2018 17:33:00
34   04/06/2018 17:33:00
35   04/06/2018 17:29:00
36   04/06/2018 17:29:00
37   04/06/2018 17:06:00
38   04/06/2018 17:06:00
39   04/06/2018 17:01:00
40   04/06/2018 17:01:00
41   04/06/2018 16:49:00
42   04/06/2018 16:49:00
43   04/06/2018 16:43:00
44   04/06/2018 16:43:00
45   04/06/2018 16:38:00
46   04/06/2018 16:38:00
47   04/06/2018 16:22:00
48   04/06/2018 16:22:00
49   04/06/2018 16:21:00
50   04/06/2018 16:21:00
51   04/06/2018 16:14:00
52   04/06/2018 16:14:00
53   04/06/2018 15:57:00
54   04/06/2018 15:57:00
89   04/06/2018 12:05:00
90   04/06/2018 12:05:00
91   04/06/2018 12:05:00
92   04/06/2018 12:05:00
93   04/06/2018 12:05:00
94   04/06/2018 12:05:00
100  04/06/2018 12:05:00
101  04/06/2018 12:05:00

如何获取每天的第一个和最后一个时间?

d1 %>% 
  group_by(Date) %>% 
  summarise(Min = min(Time), Max= max(Time))

但是出现了这个错误信息:

Error in summarise_impl(.data, dots) : 
  Evaluation error: <U+0091>min<U+0092> not meaningful for factors.

您可以对数据进行排序并使用 firstlast 而不是 minmax :

library(dplyr)
d1 %>% 
  arrange(Time) %>%
  group_by(Date) %>% 
  summarise(Min = first(Time), Max= last(Time))

# # A tibble: 1 x 3
#           Date      Min      Max
#         <fctr>   <fctr>   <fctr>
#   1 04/06/2018 12:05:00 17:51:00

或者,您可以在 data.frame 调用中使用 stringsAsFactors = FALSEminmaxcharacter 一起使用,但它们不起作用无序 factors:

d1 <- data.frame(Date= format(a1, '%d/%m/%Y'), Time=format(a1, '%H:%M:%S'),stringsAsFactors = FALSE)

library(dplyr)
d1 %>% 
  group_by(Date) %>% 
  summarise(Min = min(Time), Max= max(Time))

# # A tibble: 1 x 3
#           Date      Min      Max
#         <fctr>   <fctr>   <fctr>
#   1 04/06/2018 12:05:00 17:51:00

数据

datetimes <- c(
'04/06/2018 17:51:00',
'04/06/2018 17:50:00',
'04/06/2018 17:33:00',
'04/06/2018 17:33:00',
'04/06/2018 17:29:00',
'04/06/2018 17:29:00',
'04/06/2018 17:06:00',
'04/06/2018 17:06:00',
'04/06/2018 17:01:00',
'04/06/2018 17:01:00',
'04/06/2018 16:49:00',
'04/06/2018 16:49:00',
'04/06/2018 16:43:00',
'04/06/2018 16:43:00',
'04/06/2018 16:38:00',
'04/06/2018 16:38:00',
'04/06/2018 16:22:00',
'04/06/2018 16:22:00',
'04/06/2018 16:21:00',
'04/06/2018 16:21:00',
'04/06/2018 16:14:00',
'04/06/2018 16:14:00',
'04/06/2018 15:57:00',
'04/06/2018 15:57:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00',
'04/06/2018 12:05:00')

library(lubridate)
a1 <- dmy_hms(datetimes)
d1 <- data.frame(Date= format(a1, '%d/%m/%Y'), Time=format(a1, '%H:%M:%S'))

将 Mudskipper 的解决方案翻译为快速简洁 data.table:

setDT(d1)
d1[order(Time), .(Min = Time[1], Max = Time[.N]), Date]
         Date      Min      Max
1: 04/06/2018 12:05:00 17:51:00

为什么不同时与 base-R 进行比较:

aggregate(Time ~ Date, d1, function(x) c(Min = min(x), Max = max(x)))
        Date Time.Min Time.Max
1 04/06/2018 12:05:00 17:51:00