如何删除每组日期匹配的数据行
How to remove data rows where dates match per group
我有一个来自三组观察的数据框; a, b 和 c.
code datetime lat lon tagging_date
a 2016-04-16 06:09:07 -5.4644 71.82280 16/04/2016
a 2016-04-16 08:35:16 -5.4644 71.82280 16/04/2016
a 2016-04-25 03:52:47 -5.4644 71.82280 16/04/2016
a 2016-04-26 11:22:08 -5.4644 71.82280 16/04/2016
a 2016-05-01 01:56:44 -5.4644 71.82280 16/04/2016
a 2016-05-01 03:36:51 -5.4644 71.82280 16/04/2016
b 2013-03-22 03:06:52 -5.2662 71.67483 22/03/2013
b 2013-03-27 03:16:47 -5.2662 71.67483 22/03/2013
b 2013-03-28 10:33:40 -5.2662 71.67483 22/03/2013
c 2013-04-03 07:12:13 -5.2662 71.67483 22/03/2013
c 2013-04-03 07:15:01 -5.2662 71.67483 22/03/2013
c 2013-04-03 07:18:40 -5.2662 71.67483 22/03/2013
我需要从日期时间仅与标记日期匹配的日期中删除这些组的数据。例如a 的前两行、b 的第一行和 c 的任何行都将被删除。
有没有快速有效的方法?
复制数据集的代码如下。
structure(list(code = c("a", "a", "a", "a", "a", "a", "b", "b",
"b", "c", "c", "c"), datetime = c("2016-04-16 06:09:07", "2016-04-16 08:35:16",
"2016-04-25 03:52:47", "2016-04-26 11:22:08", "2016-05-01 01:56:44",
"2016-05-01 03:36:51", "2013-03-22 03:06:52", "2013-03-27 03:16:47",
"2013-03-28 10:33:40", "2013-04-03 07:12:13", "2013-04-03 07:15:01",
"2013-04-03 07:18:40"), lat = c(-5.4644, -5.4644, -5.4644, -5.4644,
-5.4644, -5.4644, -5.2662, -5.2662, -5.2662, -5.2662, -5.2662,
-5.2662), lon = c(71.8228, 71.8228, 71.8228, 71.8228, 71.8228,
71.8228, 71.67483, 71.67483, 71.67483, 71.67483, 71.67483, 71.67483
), tagging_date = c("16/04/2016", "16/04/2016", "16/04/2016",
"16/04/2016", "16/04/2016", "16/04/2016", "22/03/2013", "22/03/2013",
"22/03/2013", "22/03/2013", "22/03/2013", "22/03/2013")), class = "data.frame", row.names = c(NA,
-12L))
将 datetime
的 class 更改为 POSIXct
类型,将 tagging_date
更改为 Date
并仅保留日期不同的那些行。
使用 dplyr
和 lubridate
你可以这样做:
library(dplyr)
library(lubridate)
df %>%
mutate(datetime = ymd_hms(datetime),
tagging_date = dmy(tagging_date)) %>%
filter(as.Date(datetime) != tagging_date)
# code datetime lat lon tagging_date
#1 a 2016-04-25 03:52:47 -5.4644 71.82280 2016-04-16
#2 a 2016-04-26 11:22:08 -5.4644 71.82280 2016-04-16
#3 a 2016-05-01 01:56:44 -5.4644 71.82280 2016-04-16
#4 a 2016-05-01 03:36:51 -5.4644 71.82280 2016-04-16
#5 b 2013-03-27 03:16:47 -5.2662 71.67483 2013-03-22
#6 b 2013-03-28 10:33:40 -5.2662 71.67483 2013-03-22
#7 c 2013-04-03 07:12:13 -5.2662 71.67483 2013-03-22
#8 c 2013-04-03 07:15:01 -5.2662 71.67483 2013-03-22
#9 c 2013-04-03 07:18:40 -5.2662 71.67483 2013-03-22
或以 R 为基数:
subset(transform(df, datetime = as.POSIXct(datetime, tz = 'UTC'),
tagging_date = as.Date(tagging_date, '%d/%m/%Y')),
as.Date(datetime) != tagging_date)
这个有用吗:
library(dplyr)
library(lubridate)
df %>% filter(as.Date(ymd_hms(datetime)) != dmy(tagging_date))
code datetime lat lon tagging_date
1 a 2016-04-25 03:52:47 -5.4644 71.82280 16/04/2016
2 a 2016-04-26 11:22:08 -5.4644 71.82280 16/04/2016
3 a 2016-05-01 01:56:44 -5.4644 71.82280 16/04/2016
4 a 2016-05-01 03:36:51 -5.4644 71.82280 16/04/2016
5 b 2013-03-27 03:16:47 -5.2662 71.67483 22/03/2013
6 b 2013-03-28 10:33:40 -5.2662 71.67483 22/03/2013
7 c 2013-04-03 07:12:13 -5.2662 71.67483 22/03/2013
8 c 2013-04-03 07:15:01 -5.2662 71.67483 22/03/2013
9 c 2013-04-03 07:18:40 -5.2662 71.67483 22/03/2013
如果dat
是你的测试数据框,使用data.table
library(data.table)
dat = data.table(dat)
dat[as.Date(datetime) != as.Date(tagging_date, format = '%d/%m/%Y')]
code datetime lat lon tagging_date
1: a 2016-04-25 03:52:47 -5.4644 71.82280 16/04/2016
2: a 2016-04-26 11:22:08 -5.4644 71.82280 16/04/2016
3: a 2016-05-01 01:56:44 -5.4644 71.82280 16/04/2016
4: a 2016-05-01 03:36:51 -5.4644 71.82280 16/04/2016
5: b 2013-03-27 03:16:47 -5.2662 71.67483 22/03/2013
6: b 2013-03-28 10:33:40 -5.2662 71.67483 22/03/2013
7: c 2013-04-03 07:12:13 -5.2662 71.67483 22/03/2013
8: c 2013-04-03 07:15:01 -5.2662 71.67483 22/03/2013
9: c 2013-04-03 07:18:40 -5.2662 71.67483 22/03/2013
和base-R
dat[as.Date(dat$datetime) != as.Date(dat$tagging_date, format = '%d/%m/%Y'), ]
我有一个来自三组观察的数据框; a, b 和 c.
code datetime lat lon tagging_date
a 2016-04-16 06:09:07 -5.4644 71.82280 16/04/2016
a 2016-04-16 08:35:16 -5.4644 71.82280 16/04/2016
a 2016-04-25 03:52:47 -5.4644 71.82280 16/04/2016
a 2016-04-26 11:22:08 -5.4644 71.82280 16/04/2016
a 2016-05-01 01:56:44 -5.4644 71.82280 16/04/2016
a 2016-05-01 03:36:51 -5.4644 71.82280 16/04/2016
b 2013-03-22 03:06:52 -5.2662 71.67483 22/03/2013
b 2013-03-27 03:16:47 -5.2662 71.67483 22/03/2013
b 2013-03-28 10:33:40 -5.2662 71.67483 22/03/2013
c 2013-04-03 07:12:13 -5.2662 71.67483 22/03/2013
c 2013-04-03 07:15:01 -5.2662 71.67483 22/03/2013
c 2013-04-03 07:18:40 -5.2662 71.67483 22/03/2013
我需要从日期时间仅与标记日期匹配的日期中删除这些组的数据。例如a 的前两行、b 的第一行和 c 的任何行都将被删除。
有没有快速有效的方法?
复制数据集的代码如下。
structure(list(code = c("a", "a", "a", "a", "a", "a", "b", "b",
"b", "c", "c", "c"), datetime = c("2016-04-16 06:09:07", "2016-04-16 08:35:16",
"2016-04-25 03:52:47", "2016-04-26 11:22:08", "2016-05-01 01:56:44",
"2016-05-01 03:36:51", "2013-03-22 03:06:52", "2013-03-27 03:16:47",
"2013-03-28 10:33:40", "2013-04-03 07:12:13", "2013-04-03 07:15:01",
"2013-04-03 07:18:40"), lat = c(-5.4644, -5.4644, -5.4644, -5.4644,
-5.4644, -5.4644, -5.2662, -5.2662, -5.2662, -5.2662, -5.2662,
-5.2662), lon = c(71.8228, 71.8228, 71.8228, 71.8228, 71.8228,
71.8228, 71.67483, 71.67483, 71.67483, 71.67483, 71.67483, 71.67483
), tagging_date = c("16/04/2016", "16/04/2016", "16/04/2016",
"16/04/2016", "16/04/2016", "16/04/2016", "22/03/2013", "22/03/2013",
"22/03/2013", "22/03/2013", "22/03/2013", "22/03/2013")), class = "data.frame", row.names = c(NA,
-12L))
将 datetime
的 class 更改为 POSIXct
类型,将 tagging_date
更改为 Date
并仅保留日期不同的那些行。
使用 dplyr
和 lubridate
你可以这样做:
library(dplyr)
library(lubridate)
df %>%
mutate(datetime = ymd_hms(datetime),
tagging_date = dmy(tagging_date)) %>%
filter(as.Date(datetime) != tagging_date)
# code datetime lat lon tagging_date
#1 a 2016-04-25 03:52:47 -5.4644 71.82280 2016-04-16
#2 a 2016-04-26 11:22:08 -5.4644 71.82280 2016-04-16
#3 a 2016-05-01 01:56:44 -5.4644 71.82280 2016-04-16
#4 a 2016-05-01 03:36:51 -5.4644 71.82280 2016-04-16
#5 b 2013-03-27 03:16:47 -5.2662 71.67483 2013-03-22
#6 b 2013-03-28 10:33:40 -5.2662 71.67483 2013-03-22
#7 c 2013-04-03 07:12:13 -5.2662 71.67483 2013-03-22
#8 c 2013-04-03 07:15:01 -5.2662 71.67483 2013-03-22
#9 c 2013-04-03 07:18:40 -5.2662 71.67483 2013-03-22
或以 R 为基数:
subset(transform(df, datetime = as.POSIXct(datetime, tz = 'UTC'),
tagging_date = as.Date(tagging_date, '%d/%m/%Y')),
as.Date(datetime) != tagging_date)
这个有用吗:
library(dplyr)
library(lubridate)
df %>% filter(as.Date(ymd_hms(datetime)) != dmy(tagging_date))
code datetime lat lon tagging_date
1 a 2016-04-25 03:52:47 -5.4644 71.82280 16/04/2016
2 a 2016-04-26 11:22:08 -5.4644 71.82280 16/04/2016
3 a 2016-05-01 01:56:44 -5.4644 71.82280 16/04/2016
4 a 2016-05-01 03:36:51 -5.4644 71.82280 16/04/2016
5 b 2013-03-27 03:16:47 -5.2662 71.67483 22/03/2013
6 b 2013-03-28 10:33:40 -5.2662 71.67483 22/03/2013
7 c 2013-04-03 07:12:13 -5.2662 71.67483 22/03/2013
8 c 2013-04-03 07:15:01 -5.2662 71.67483 22/03/2013
9 c 2013-04-03 07:18:40 -5.2662 71.67483 22/03/2013
如果dat
是你的测试数据框,使用data.table
library(data.table)
dat = data.table(dat)
dat[as.Date(datetime) != as.Date(tagging_date, format = '%d/%m/%Y')]
code datetime lat lon tagging_date
1: a 2016-04-25 03:52:47 -5.4644 71.82280 16/04/2016
2: a 2016-04-26 11:22:08 -5.4644 71.82280 16/04/2016
3: a 2016-05-01 01:56:44 -5.4644 71.82280 16/04/2016
4: a 2016-05-01 03:36:51 -5.4644 71.82280 16/04/2016
5: b 2013-03-27 03:16:47 -5.2662 71.67483 22/03/2013
6: b 2013-03-28 10:33:40 -5.2662 71.67483 22/03/2013
7: c 2013-04-03 07:12:13 -5.2662 71.67483 22/03/2013
8: c 2013-04-03 07:15:01 -5.2662 71.67483 22/03/2013
9: c 2013-04-03 07:18:40 -5.2662 71.67483 22/03/2013
和base-R
dat[as.Date(dat$datetime) != as.Date(dat$tagging_date, format = '%d/%m/%Y'), ]