lubridate 不喜欢子集化吗?
Does lubridate not like subsetting?
此问题与我提出的问题有关 earlier。
我花了一些时间思考如何更清楚地传达我的问题并为这个冗长的问题道歉。任何意见是极大的赞赏。
下面是我正在使用的数据集的一个高度子集化的百行代码片段。
SPD_2015 <- structure(list(summarized.offense.description = c("ASSAULT",
"THREATS", "CAR PROWL", "SHOPLIFTING", "MAIL THEFT", "THREATS",
"DISTURBANCE", "STOLEN PROPERTY", "TRESPASS", "VEHICLE THEFT",
"CAR PROWL", "THREATS", "STOLEN PROPERTY", "VEHICLE THEFT", "BURGLARY-SECURE PARKING-RES",
"CAR PROWL", "THREATS", "BIKE THEFT", "BURGLARY", "ASSAULT",
"STOLEN PROPERTY", "DISTURBANCE", "VEHICLE THEFT", "CAR PROWL",
"OTHER PROPERTY", "ASSAULT", "PROPERTY DAMAGE", "BURGLARY-SECURE PARKING-RES",
"ANIMAL COMPLAINT", "OTHER PROPERTY", "BURGLARY", "BURGLARY",
"CAR PROWL", "SHOPLIFTING", "BURGLARY", "PROPERTY DAMAGE", "DISTURBANCE",
"PROPERTY DAMAGE", "STOLEN PROPERTY", "OTHER PROPERTY", "MAIL THEFT",
"PROPERTY DAMAGE", "VEHICLE THEFT", "OTHER PROPERTY", "ROBBERY",
"CAR PROWL", "NARCOTICS", "OTHER PROPERTY", "BURGLARY", "DISTURBANCE",
"ASSAULT", "BURGLARY-SECURE PARKING-RES", "OTHER PROPERTY", "FRAUD",
"SHOPLIFTING", "OTHER PROPERTY", "OTHER PROPERTY", "DISTURBANCE",
"CAR PROWL", "STOLEN PROPERTY", "OTHER PROPERTY", "OTHER PROPERTY",
"VIOLATION OF COURT ORDER", "DISTURBANCE", "NARCOTICS", "ASSAULT",
"DISTURBANCE", "TRESPASS", "NARCOTICS", "CAR PROWL", "NARCOTICS",
"OTHER PROPERTY", "CAR PROWL", "CAR PROWL", "ASSAULT", "TRAFFIC",
"OTHER PROPERTY", "CAR PROWL", "PROSTITUTION", "OTHER PROPERTY",
"OTHER PROPERTY", "ASSAULT", "BURGLARY", "DISTURBANCE", "PROPERTY DAMAGE",
"PROPERTY DAMAGE", "BURGLARY", "VEHICLE THEFT", "FRAUD", "VEHICLE THEFT",
"FRAUD", "CAR PROWL", "BIKE THEFT", "CAR PROWL", "WARRANT ARREST",
"STOLEN PROPERTY", "CAR PROWL", "PROPERTY DAMAGE", "VEHICLE THEFT",
"BIKE THEFT"), occurred.date.or.date.range.start = c("04/17/2015 01:10:00 AM",
"11/15/2015 12:04:00 PM", "05/29/2015 08:00:00 PM", "12/15/2015 02:25:00 PM",
"07/28/2015 12:00:00 AM", "02/24/2015 06:01:00 PM", "05/24/2015 04:20:00 PM",
"03/13/2015 02:04:00 PM", "06/14/2015 08:00:00 AM", "05/19/2015 03:18:00 PM",
"07/18/2015 06:00:00 AM", "05/11/2015 05:16:00 PM", "01/08/2015 12:52:00 PM",
"06/17/2015 05:00:00 PM", "07/04/2015 12:00:00 AM", "10/26/2015 12:12:00 AM",
"05/01/2015 12:00:00 PM", "07/02/2015 10:00:00 PM", "01/10/2015 07:30:00 PM",
"02/17/2015 01:29:00 PM", "12/17/2015 02:26:00 AM", "08/04/2015 10:49:00 PM",
"10/27/2015 12:29:00 AM", "07/29/2015 03:00:00 PM", "10/24/2015 06:30:00 PM",
"02/20/2015 03:07:00 AM", "11/11/2015 09:00:00 AM", "03/24/2015 10:00:00 PM",
"11/03/2015 08:47:00 PM", "04/15/2015 02:00:00 PM", "07/15/2015 03:00:00 PM",
"11/17/2015 08:30:00 AM", "09/22/2015 05:00:00 PM", "02/09/2015 09:19:00 AM",
"01/07/2015 08:30:00 AM", "05/01/2015 07:30:00 AM", "04/26/2015 03:30:00 AM",
"04/18/2015 03:00:00 AM", "10/01/2015 08:00:00 PM", "05/07/2015 01:00:00 AM",
"02/05/2015 03:15:00 PM", "01/18/2015 05:00:00 PM", "10/17/2015 11:00:00 PM",
"03/23/2015 05:35:00 PM", "02/16/2015 07:25:00 PM", "07/30/2015 08:00:00 PM",
"11/10/2015 02:28:00 PM", "03/14/2015 10:10:00 AM", "12/10/2015 08:26:00 PM",
"10/05/2015 01:45:00 AM", "02/16/2015 01:56:00 PM", "10/19/2015 06:27:00 PM",
"12/01/2015 07:30:00 AM", "01/28/2015 08:40:00 PM", "05/01/2015 01:40:00 PM",
"10/30/2015 03:15:00 AM", "09/04/2015 03:34:00 PM", "06/06/2015 04:53:00 PM",
"07/22/2015 06:20:00 AM", "12/11/2015 01:41:00 PM", "05/20/2015 01:09:00 PM",
"09/18/2015 12:00:00 PM", "07/08/2015 11:05:00 PM", "02/22/2015 01:38:00 AM",
"07/22/2015 01:12:00 PM", "09/07/2015 10:43:00 AM", "08/11/2015 04:00:00 PM",
"10/13/2015 06:33:00 AM", "10/10/2015 05:32:00 PM", "11/15/2015 07:09:00 PM",
"11/19/2015 03:05:00 PM", "04/08/2015 04:33:00 PM", "05/11/2015 12:01:00 AM",
"04/21/2015 06:15:00 PM", "06/13/2015 10:29:00 AM", "06/22/2015 06:41:00 PM",
"09/03/2015 08:00:00 AM", "04/08/2015 06:00:00 PM", "07/17/2015 08:00:00 PM",
"08/29/2015 09:00:00 AM", "04/28/2015 01:46:00 PM", "09/07/2015 07:00:00 PM",
"12/30/2015 06:30:00 AM", "08/29/2015 11:37:00 PM", "08/24/2015 10:00:00 PM",
"06/17/2015 07:02:00 AM", "02/14/2015 10:21:00 PM", "03/29/2015 07:00:00 PM",
"10/01/2015 07:15:00 AM", "06/14/2015 03:00:00 PM", "12/16/2014 09:00:00 AM",
"02/14/2015 07:54:00 PM", "10/02/2015 08:17:00 AM", "05/14/2015 08:30:00 AM",
"07/07/2015 10:15:00 AM", "04/07/2015 01:48:00 AM", "11/02/2015 11:00:00 PM",
"04/16/2015 03:00:00 PM", "08/22/2015 08:09:00 AM", "10/24/2015 05:00:00 PM"
)), .Names = c("summarized.offense.description", "occurred.date.or.date.range.start"
), row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"
))
我使用以下代码从预先存在的列中提取时间数据:
#Splitting time from column occured.date
SPD_2015 <- mutate(SPD_2015, occurred.time = str_sub(SPD_2015$occurred.date.or.date.range.start, -11, -1))
#Converting character to time for occured.time
SPD_2015$occurred.time <- strptime(SPD_2015$occurred.time, "%I:%M:%S %p") %>%
str_sub(-8, -1) %>%
hms()
#creating the occurred.time.hour value so I can isolate the hour value
SPD_2015 <- mutate(SPD_2015, occurred.time.hour = hour(occurred.time))
现在我有一列包含发生犯罪的独立小时值,我可以使用 ggplot2 对其进行绘图。但是,如果我使用 dplyr:
对数据进行子集化
#filtering data for only car prowl
car.prowl <- filter(SPD_2015, summarized.offense.description == "CAR PROWL")
我新创建的数据框 (car.prowl) 中 "occurred.time" 和 "occurred.time.hour" 列中的时间值不再匹配。 "occurred.time.hour" 列与来源正确匹配,但 occurred.time 列现已更改。
并且只是为了补充这一点。我为 car prowls 创建了一个单独的数据框,因为当我最初尝试使用 ggplot
绘制犯罪发生时间时
ggplot(car.prowl, aes(hour(occurred.time))) +
geom_bar()
我会得到错误:"Error: Aesthetics must be either length 1 or the same as the data (14): x"。这是有道理的,我理解。
> dim(car.prowl)
[1] 14 4
但是car.prowl的长度是14,当我输入下面的代码时:
> length(hour(car.prowl$occurred.time))
[1] 100
它显示的是原始数据集的长度,而不是14的子集长度。
任何人都可以提出解决方案或解决方法吗?
谢谢
有趣的问题。让我们首先获得绘图所需的输出。我们可以使用 mdy_hms
将字符转换为日期时间。它可能比使用 sub_str
的原始方法更可靠。之后,hour
可以根据日期时间提取小时。
library(tidyverse)
library(lubridate)
library(stringr)
SPD_2015_updated <- SPD_2015 %>%
mutate(occurred.time = mdy_hms(occurred.date.or.date.range.start)) %>%
mutate(occurred.time.hour = hour(occurred.time))
car.prowl_updated <- SPD_2015_updated %>%
filter(summarized.offense.description == "CAR PROWL")
键入 glimpse(SPD_2015_updated)
和 glimpse(car.prowl_updated)
。可以看到每条记录都是匹配的。 occurred.time
是日期时间 class,而 occurred.time.hour
是整数 class。我想这些数据框可以供您绘图了。
至于你原来的做法哪里出了问题,我不是很明白。但是如果你输入 glimpse(car.prowl)
,你可以看到 occurred.time
在 S4: Period
中。这可能是 dplyr::filter
不起作用的关键。如果我有时间,我会进一步调查为什么 dplyr::filter
不能对您的原始数据框进行子集化。
此问题与我提出的问题有关 earlier。 我花了一些时间思考如何更清楚地传达我的问题并为这个冗长的问题道歉。任何意见是极大的赞赏。
下面是我正在使用的数据集的一个高度子集化的百行代码片段。
SPD_2015 <- structure(list(summarized.offense.description = c("ASSAULT",
"THREATS", "CAR PROWL", "SHOPLIFTING", "MAIL THEFT", "THREATS",
"DISTURBANCE", "STOLEN PROPERTY", "TRESPASS", "VEHICLE THEFT",
"CAR PROWL", "THREATS", "STOLEN PROPERTY", "VEHICLE THEFT", "BURGLARY-SECURE PARKING-RES",
"CAR PROWL", "THREATS", "BIKE THEFT", "BURGLARY", "ASSAULT",
"STOLEN PROPERTY", "DISTURBANCE", "VEHICLE THEFT", "CAR PROWL",
"OTHER PROPERTY", "ASSAULT", "PROPERTY DAMAGE", "BURGLARY-SECURE PARKING-RES",
"ANIMAL COMPLAINT", "OTHER PROPERTY", "BURGLARY", "BURGLARY",
"CAR PROWL", "SHOPLIFTING", "BURGLARY", "PROPERTY DAMAGE", "DISTURBANCE",
"PROPERTY DAMAGE", "STOLEN PROPERTY", "OTHER PROPERTY", "MAIL THEFT",
"PROPERTY DAMAGE", "VEHICLE THEFT", "OTHER PROPERTY", "ROBBERY",
"CAR PROWL", "NARCOTICS", "OTHER PROPERTY", "BURGLARY", "DISTURBANCE",
"ASSAULT", "BURGLARY-SECURE PARKING-RES", "OTHER PROPERTY", "FRAUD",
"SHOPLIFTING", "OTHER PROPERTY", "OTHER PROPERTY", "DISTURBANCE",
"CAR PROWL", "STOLEN PROPERTY", "OTHER PROPERTY", "OTHER PROPERTY",
"VIOLATION OF COURT ORDER", "DISTURBANCE", "NARCOTICS", "ASSAULT",
"DISTURBANCE", "TRESPASS", "NARCOTICS", "CAR PROWL", "NARCOTICS",
"OTHER PROPERTY", "CAR PROWL", "CAR PROWL", "ASSAULT", "TRAFFIC",
"OTHER PROPERTY", "CAR PROWL", "PROSTITUTION", "OTHER PROPERTY",
"OTHER PROPERTY", "ASSAULT", "BURGLARY", "DISTURBANCE", "PROPERTY DAMAGE",
"PROPERTY DAMAGE", "BURGLARY", "VEHICLE THEFT", "FRAUD", "VEHICLE THEFT",
"FRAUD", "CAR PROWL", "BIKE THEFT", "CAR PROWL", "WARRANT ARREST",
"STOLEN PROPERTY", "CAR PROWL", "PROPERTY DAMAGE", "VEHICLE THEFT",
"BIKE THEFT"), occurred.date.or.date.range.start = c("04/17/2015 01:10:00 AM",
"11/15/2015 12:04:00 PM", "05/29/2015 08:00:00 PM", "12/15/2015 02:25:00 PM",
"07/28/2015 12:00:00 AM", "02/24/2015 06:01:00 PM", "05/24/2015 04:20:00 PM",
"03/13/2015 02:04:00 PM", "06/14/2015 08:00:00 AM", "05/19/2015 03:18:00 PM",
"07/18/2015 06:00:00 AM", "05/11/2015 05:16:00 PM", "01/08/2015 12:52:00 PM",
"06/17/2015 05:00:00 PM", "07/04/2015 12:00:00 AM", "10/26/2015 12:12:00 AM",
"05/01/2015 12:00:00 PM", "07/02/2015 10:00:00 PM", "01/10/2015 07:30:00 PM",
"02/17/2015 01:29:00 PM", "12/17/2015 02:26:00 AM", "08/04/2015 10:49:00 PM",
"10/27/2015 12:29:00 AM", "07/29/2015 03:00:00 PM", "10/24/2015 06:30:00 PM",
"02/20/2015 03:07:00 AM", "11/11/2015 09:00:00 AM", "03/24/2015 10:00:00 PM",
"11/03/2015 08:47:00 PM", "04/15/2015 02:00:00 PM", "07/15/2015 03:00:00 PM",
"11/17/2015 08:30:00 AM", "09/22/2015 05:00:00 PM", "02/09/2015 09:19:00 AM",
"01/07/2015 08:30:00 AM", "05/01/2015 07:30:00 AM", "04/26/2015 03:30:00 AM",
"04/18/2015 03:00:00 AM", "10/01/2015 08:00:00 PM", "05/07/2015 01:00:00 AM",
"02/05/2015 03:15:00 PM", "01/18/2015 05:00:00 PM", "10/17/2015 11:00:00 PM",
"03/23/2015 05:35:00 PM", "02/16/2015 07:25:00 PM", "07/30/2015 08:00:00 PM",
"11/10/2015 02:28:00 PM", "03/14/2015 10:10:00 AM", "12/10/2015 08:26:00 PM",
"10/05/2015 01:45:00 AM", "02/16/2015 01:56:00 PM", "10/19/2015 06:27:00 PM",
"12/01/2015 07:30:00 AM", "01/28/2015 08:40:00 PM", "05/01/2015 01:40:00 PM",
"10/30/2015 03:15:00 AM", "09/04/2015 03:34:00 PM", "06/06/2015 04:53:00 PM",
"07/22/2015 06:20:00 AM", "12/11/2015 01:41:00 PM", "05/20/2015 01:09:00 PM",
"09/18/2015 12:00:00 PM", "07/08/2015 11:05:00 PM", "02/22/2015 01:38:00 AM",
"07/22/2015 01:12:00 PM", "09/07/2015 10:43:00 AM", "08/11/2015 04:00:00 PM",
"10/13/2015 06:33:00 AM", "10/10/2015 05:32:00 PM", "11/15/2015 07:09:00 PM",
"11/19/2015 03:05:00 PM", "04/08/2015 04:33:00 PM", "05/11/2015 12:01:00 AM",
"04/21/2015 06:15:00 PM", "06/13/2015 10:29:00 AM", "06/22/2015 06:41:00 PM",
"09/03/2015 08:00:00 AM", "04/08/2015 06:00:00 PM", "07/17/2015 08:00:00 PM",
"08/29/2015 09:00:00 AM", "04/28/2015 01:46:00 PM", "09/07/2015 07:00:00 PM",
"12/30/2015 06:30:00 AM", "08/29/2015 11:37:00 PM", "08/24/2015 10:00:00 PM",
"06/17/2015 07:02:00 AM", "02/14/2015 10:21:00 PM", "03/29/2015 07:00:00 PM",
"10/01/2015 07:15:00 AM", "06/14/2015 03:00:00 PM", "12/16/2014 09:00:00 AM",
"02/14/2015 07:54:00 PM", "10/02/2015 08:17:00 AM", "05/14/2015 08:30:00 AM",
"07/07/2015 10:15:00 AM", "04/07/2015 01:48:00 AM", "11/02/2015 11:00:00 PM",
"04/16/2015 03:00:00 PM", "08/22/2015 08:09:00 AM", "10/24/2015 05:00:00 PM"
)), .Names = c("summarized.offense.description", "occurred.date.or.date.range.start"
), row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"
))
我使用以下代码从预先存在的列中提取时间数据:
#Splitting time from column occured.date
SPD_2015 <- mutate(SPD_2015, occurred.time = str_sub(SPD_2015$occurred.date.or.date.range.start, -11, -1))
#Converting character to time for occured.time
SPD_2015$occurred.time <- strptime(SPD_2015$occurred.time, "%I:%M:%S %p") %>%
str_sub(-8, -1) %>%
hms()
#creating the occurred.time.hour value so I can isolate the hour value
SPD_2015 <- mutate(SPD_2015, occurred.time.hour = hour(occurred.time))
现在我有一列包含发生犯罪的独立小时值,我可以使用 ggplot2 对其进行绘图。但是,如果我使用 dplyr:
对数据进行子集化#filtering data for only car prowl
car.prowl <- filter(SPD_2015, summarized.offense.description == "CAR PROWL")
我新创建的数据框 (car.prowl) 中 "occurred.time" 和 "occurred.time.hour" 列中的时间值不再匹配。 "occurred.time.hour" 列与来源正确匹配,但 occurred.time 列现已更改。
并且只是为了补充这一点。我为 car prowls 创建了一个单独的数据框,因为当我最初尝试使用 ggplot
绘制犯罪发生时间时ggplot(car.prowl, aes(hour(occurred.time))) +
geom_bar()
我会得到错误:"Error: Aesthetics must be either length 1 or the same as the data (14): x"。这是有道理的,我理解。
> dim(car.prowl)
[1] 14 4
但是car.prowl的长度是14,当我输入下面的代码时:
> length(hour(car.prowl$occurred.time))
[1] 100
它显示的是原始数据集的长度,而不是14的子集长度。
任何人都可以提出解决方案或解决方法吗? 谢谢
有趣的问题。让我们首先获得绘图所需的输出。我们可以使用 mdy_hms
将字符转换为日期时间。它可能比使用 sub_str
的原始方法更可靠。之后,hour
可以根据日期时间提取小时。
library(tidyverse)
library(lubridate)
library(stringr)
SPD_2015_updated <- SPD_2015 %>%
mutate(occurred.time = mdy_hms(occurred.date.or.date.range.start)) %>%
mutate(occurred.time.hour = hour(occurred.time))
car.prowl_updated <- SPD_2015_updated %>%
filter(summarized.offense.description == "CAR PROWL")
键入 glimpse(SPD_2015_updated)
和 glimpse(car.prowl_updated)
。可以看到每条记录都是匹配的。 occurred.time
是日期时间 class,而 occurred.time.hour
是整数 class。我想这些数据框可以供您绘图了。
至于你原来的做法哪里出了问题,我不是很明白。但是如果你输入 glimpse(car.prowl)
,你可以看到 occurred.time
在 S4: Period
中。这可能是 dplyr::filter
不起作用的关键。如果我有时间,我会进一步调查为什么 dplyr::filter
不能对您的原始数据框进行子集化。