拆分和提取日期(自由格式书写)和以数字形式提到的小时数到单独的列中 - R
Split & Extract date (Free form writing) and hours mentioned as numbers into separate columns - R
我有一个输入
**ID** **Input Text**
1 08/18/2017 8 hours
2 08/14/2017-10HRS
3 8/28/17 through 9/1/17 8 hrs per day
4 08/17/17-6hrs
5 08/14/2017-8hrs 08/15/2017-8hrs 08/16/2017-8hrs
6 7.27.2017 -8 hrs, 8.3.2017 8 hours, 8.14.2017 8hrs
7 08/16/2017 7 hours 10 minutes
8 8 hrs - 07/11/2017 and 8 hrs 07/12/2017
9 08/14/17-8hrs // 08/15/17-8hrs
10 08/14/2017- 7:45 hrs// 08/15/2017- 7:45 hrs//
11 Wed, 8/16/17 …. Cx missed 6 hrs on 8/14/17… missed 8 hrs on 8/15/17
12 08/10/2017 8 hrs
13 08/11/2017 2 hrs
14 08/16/2017 8 hrs
15 08/07/2017- 4 hours missed- Doctors appt , 08/13/2017 8 hours - Incapacity , 08/15/2017 -8 hours- Incapacity , 08/16/2017 -3 hours // Doctor
16 Aug 1, 2017 – 7.75 hours
17 Aug 2, 2017 – 1.75 hours
18 Aug 3, 2017 – 3 hours
19 Aug 4, 2017 – 4 hours
20 Aug 7, 2017 – 7.75 hours
预期输出为:
到目前为止,我尝试拆分输入文本,希望使用 lubridate 将列转换为日期,但无法
dt$Date_lubridate <- mdy(dt$Time)
Warning message:
All formats failed to parse. No formats found.
想将列拆分为日期和编号,然后使用 lubridate 将日期列转换为日期,但由于日期格式的变化我被卡住了。
x<-dt$Time
sc1 <- sub("\-.*", "", x)
sc2 <- sub('.*-', '', x)
sc3 <- sub("\ .*", "", x)
fstat <- cbind.data.frame ("ID" = dt$ID, "Actual" = x, "Date" = sc1, "time" = sc2, "time2" = sc3)
尝试在 sc1 上使用:
library(lubridate)
parse_date_time(x = sc1,
orders = c("d m y", "d B Y", "m/d/y"),
locale = "eng")
但由于变化,我遇到了解析错误。
我想我到处都是,因为我缺少一些基本的操作,任何 nudge/help 朝着正确的方向都会有所帮助。
您可以使用正则表达式提取您想要的日期部分,然后使用mdy()
进行转换。
library(stringr)
regDate = "([A-Z][a-z]{2}|\d{1,2})( |\/|\.)\d{1,2}(,|\/|\.) ?\d{2,4}"
str_extract(dt$Time, regDate) %>% unlist() %>% lubridate::mdy()
为了方便起见,最后使用 dplyr 管道。
我有一个输入
**ID** **Input Text**
1 08/18/2017 8 hours
2 08/14/2017-10HRS
3 8/28/17 through 9/1/17 8 hrs per day
4 08/17/17-6hrs
5 08/14/2017-8hrs 08/15/2017-8hrs 08/16/2017-8hrs
6 7.27.2017 -8 hrs, 8.3.2017 8 hours, 8.14.2017 8hrs
7 08/16/2017 7 hours 10 minutes
8 8 hrs - 07/11/2017 and 8 hrs 07/12/2017
9 08/14/17-8hrs // 08/15/17-8hrs
10 08/14/2017- 7:45 hrs// 08/15/2017- 7:45 hrs//
11 Wed, 8/16/17 …. Cx missed 6 hrs on 8/14/17… missed 8 hrs on 8/15/17
12 08/10/2017 8 hrs
13 08/11/2017 2 hrs
14 08/16/2017 8 hrs
15 08/07/2017- 4 hours missed- Doctors appt , 08/13/2017 8 hours - Incapacity , 08/15/2017 -8 hours- Incapacity , 08/16/2017 -3 hours // Doctor
16 Aug 1, 2017 – 7.75 hours
17 Aug 2, 2017 – 1.75 hours
18 Aug 3, 2017 – 3 hours
19 Aug 4, 2017 – 4 hours
20 Aug 7, 2017 – 7.75 hours
预期输出为:
到目前为止,我尝试拆分输入文本,希望使用 lubridate 将列转换为日期,但无法
dt$Date_lubridate <- mdy(dt$Time)
Warning message:
All formats failed to parse. No formats found.
想将列拆分为日期和编号,然后使用 lubridate 将日期列转换为日期,但由于日期格式的变化我被卡住了。
x<-dt$Time
sc1 <- sub("\-.*", "", x)
sc2 <- sub('.*-', '', x)
sc3 <- sub("\ .*", "", x)
fstat <- cbind.data.frame ("ID" = dt$ID, "Actual" = x, "Date" = sc1, "time" = sc2, "time2" = sc3)
尝试在 sc1 上使用:
library(lubridate)
parse_date_time(x = sc1,
orders = c("d m y", "d B Y", "m/d/y"),
locale = "eng")
但由于变化,我遇到了解析错误。
我想我到处都是,因为我缺少一些基本的操作,任何 nudge/help 朝着正确的方向都会有所帮助。
您可以使用正则表达式提取您想要的日期部分,然后使用mdy()
进行转换。
library(stringr)
regDate = "([A-Z][a-z]{2}|\d{1,2})( |\/|\.)\d{1,2}(,|\/|\.) ?\d{2,4}"
str_extract(dt$Time, regDate) %>% unlist() %>% lubridate::mdy()
为了方便起见,最后使用 dplyr 管道。