如何删除每组日期匹配的数据行

How to remove data rows where dates match per group

我有一个来自三组观察的数据框; a, b 和 c.

code          datetime          lat      lon    tagging_date
  a     2016-04-16 06:09:07  -5.4644  71.82280   16/04/2016
  a     2016-04-16 08:35:16  -5.4644  71.82280   16/04/2016
  a     2016-04-25 03:52:47  -5.4644  71.82280   16/04/2016
  a     2016-04-26 11:22:08  -5.4644  71.82280   16/04/2016
  a     2016-05-01 01:56:44  -5.4644  71.82280   16/04/2016
  a     2016-05-01 03:36:51  -5.4644  71.82280   16/04/2016
  b     2013-03-22 03:06:52  -5.2662  71.67483   22/03/2013
  b     2013-03-27 03:16:47  -5.2662  71.67483   22/03/2013
  b     2013-03-28 10:33:40  -5.2662  71.67483   22/03/2013
  c     2013-04-03 07:12:13  -5.2662  71.67483   22/03/2013
  c     2013-04-03 07:15:01  -5.2662  71.67483   22/03/2013
  c     2013-04-03 07:18:40 -5.2662   71.67483   22/03/2013

我需要从日期时间仅与标记日期匹配的日期中删除这些组的数据。例如a 的前两行、b 的第一行和 c 的任何行都将被删除。

有没有快速有效的方法?

复制数据集的代码如下。

structure(list(code = c("a", "a", "a", "a", "a", "a", "b", "b", 
"b", "c", "c", "c"), datetime = c("2016-04-16 06:09:07", "2016-04-16 08:35:16", 
"2016-04-25 03:52:47", "2016-04-26 11:22:08", "2016-05-01 01:56:44", 
"2016-05-01 03:36:51", "2013-03-22 03:06:52", "2013-03-27 03:16:47", 
"2013-03-28 10:33:40", "2013-04-03 07:12:13", "2013-04-03 07:15:01", 
"2013-04-03 07:18:40"), lat = c(-5.4644, -5.4644, -5.4644, -5.4644, 
-5.4644, -5.4644, -5.2662, -5.2662, -5.2662, -5.2662, -5.2662, 
-5.2662), lon = c(71.8228, 71.8228, 71.8228, 71.8228, 71.8228, 
71.8228, 71.67483, 71.67483, 71.67483, 71.67483, 71.67483, 71.67483
), tagging_date = c("16/04/2016", "16/04/2016", "16/04/2016", 
"16/04/2016", "16/04/2016", "16/04/2016", "22/03/2013", "22/03/2013", 
"22/03/2013", "22/03/2013", "22/03/2013", "22/03/2013")), class = "data.frame", row.names = c(NA, 
-12L))

datetime 的 class 更改为 POSIXct 类型,将 tagging_date 更改为 Date 并仅保留日期不同的那些行。

使用 dplyrlubridate 你可以这样做:

library(dplyr)
library(lubridate)

df %>%
  mutate(datetime = ymd_hms(datetime), 
         tagging_date = dmy(tagging_date)) %>%
  filter(as.Date(datetime) != tagging_date)

#  code            datetime     lat      lon tagging_date
#1    a 2016-04-25 03:52:47 -5.4644 71.82280   2016-04-16
#2    a 2016-04-26 11:22:08 -5.4644 71.82280   2016-04-16
#3    a 2016-05-01 01:56:44 -5.4644 71.82280   2016-04-16
#4    a 2016-05-01 03:36:51 -5.4644 71.82280   2016-04-16
#5    b 2013-03-27 03:16:47 -5.2662 71.67483   2013-03-22
#6    b 2013-03-28 10:33:40 -5.2662 71.67483   2013-03-22
#7    c 2013-04-03 07:12:13 -5.2662 71.67483   2013-03-22
#8    c 2013-04-03 07:15:01 -5.2662 71.67483   2013-03-22
#9    c 2013-04-03 07:18:40 -5.2662 71.67483   2013-03-22

或以 R 为基数:

subset(transform(df, datetime = as.POSIXct(datetime, tz = 'UTC'), 
                     tagging_date = as.Date(tagging_date, '%d/%m/%Y')), 
       as.Date(datetime) != tagging_date) 

这个有用吗:

library(dplyr)
library(lubridate)
df %>% filter(as.Date(ymd_hms(datetime)) != dmy(tagging_date))
  code            datetime     lat      lon tagging_date
1    a 2016-04-25 03:52:47 -5.4644 71.82280   16/04/2016
2    a 2016-04-26 11:22:08 -5.4644 71.82280   16/04/2016
3    a 2016-05-01 01:56:44 -5.4644 71.82280   16/04/2016
4    a 2016-05-01 03:36:51 -5.4644 71.82280   16/04/2016
5    b 2013-03-27 03:16:47 -5.2662 71.67483   22/03/2013
6    b 2013-03-28 10:33:40 -5.2662 71.67483   22/03/2013
7    c 2013-04-03 07:12:13 -5.2662 71.67483   22/03/2013
8    c 2013-04-03 07:15:01 -5.2662 71.67483   22/03/2013
9    c 2013-04-03 07:18:40 -5.2662 71.67483   22/03/2013

如果dat是你的测试数据框,使用data.table

library(data.table)
dat = data.table(dat)
dat[as.Date(datetime) != as.Date(tagging_date, format = '%d/%m/%Y')]



  code            datetime     lat      lon tagging_date
1:    a 2016-04-25 03:52:47 -5.4644 71.82280   16/04/2016
2:    a 2016-04-26 11:22:08 -5.4644 71.82280   16/04/2016
3:    a 2016-05-01 01:56:44 -5.4644 71.82280   16/04/2016
4:    a 2016-05-01 03:36:51 -5.4644 71.82280   16/04/2016
5:    b 2013-03-27 03:16:47 -5.2662 71.67483   22/03/2013
6:    b 2013-03-28 10:33:40 -5.2662 71.67483   22/03/2013
7:    c 2013-04-03 07:12:13 -5.2662 71.67483   22/03/2013
8:    c 2013-04-03 07:15:01 -5.2662 71.67483   22/03/2013
9:    c 2013-04-03 07:18:40 -5.2662 71.67483   22/03/2013

base-R

dat[as.Date(dat$datetime) != as.Date(dat$tagging_date, format = '%d/%m/%Y'), ]