如何使用 R 将不规则时间转换为 XTS 对象
How to convert irregular times into XTS object using R
我有以下 data.frame
,我想将其转换为 xts()
对象,但一直在绞尽脑汁想弄清楚如何格式化时间:
data.frame
数据从最近的(顶部)到最旧的(底部)排列。问题是每一行都与格式不一致,所以我无法尝试以每行显示正确日期和时间的方式对其进行格式化。
Date/Time 列的所需输出:
01/05/17 02:55 PM
01/05/17 11:40 AM
01/05/17 07:00 AM
12/30/16 05:50 PM
12/29/16 07:03 AM
12/30/16 07:00 AM
数据:
data <- structure(list(Date = c("Jan-05-17 02:55PM", "11:40AM", "07:00AM",
"Dec-30-16 05:50PM", "Dec-29-16 07:03AM", "07:00AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%",
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday",
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire",
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%",
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)",
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date",
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")
假设您在所需日期时间输出的最后一行有错字,我猜您的意思是 12/29/16 07:00 AM
,那么当您在列 Date
中有一个元素时缺少日期,取最近的已知日期并滚动 "backwards":
library(stringr)
l_datetime <- str_split(data$Date, " ")
data$ymd <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[1]], NA)))
data$time <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[2]], x[[1]])))
# Roll "backward" the latest known date for elements of column `Date` that have missing YYYY-MM-DD values
data$ymd <- na.locf(data$ymd)
# Carefully parse the time strings allowing for AM/PM:
psx_date <- as.POSIXct(paste(data$ymd, data$time), format = "%b-%d-%y %I:%M%p")
x_data <- xts(x = data[, c("News", "Symbol")], order.by = psx_date)
# > x_data
# News Symbol
# 2016-12-29 07:00:00 "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire" "ETRM"
# 2016-12-29 07:03:00 "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)" "ETRM"
# 2016-12-30 17:50:00 "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%" "ETRM"
# 2017-01-05 07:00:00 "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire" "ETRM"
# 2017-01-05 11:40:00 "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday" "ETRM"
# 2017-01-05 14:55:00 "ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%" "ETRM"
使用 sub
将 Date
开头的数字替换为 NA
后跟 space 后跟数字。从那里使用 read.table
创建一个 2 列数据框,其中第 1 列中的日期(或 NA
)和第 2 列中的时间。使用 [=17= 填写 NA
值]给出DF2
。现在 cbind
DF2
和 data[-1]
阅读使用 read.zoo
创建的 data.frame。最后将生成的 "zoo"
对象转换为 "xts"
.
DF2 <- na.locf(read.table(text = sub("^(\d)", "NA \1", data$Date)))
z <- read.zoo(cbind(DF2, data[-1]), index = 1:2, tz = "", format = "%b-%d-%y %I:%M%p")
as.xts(z)
这是一个使用 tidyquant
包的解决方案,它加载了解决此问题所需的所有包。与其他解决方案一样,您需要具有一致的日期结构,例如:
"Jan-05-17 02:55 PM"
使用lubridate
包,可以通过mdy_hm()
函数转换为POSIXct
class,如下所示:
"Jan-05-17 02:55 PM" %>% lubridate::mdy_hm()
> "2017-01-05 14:55:00 UTC"
其中 lubridate::mdy_hm()
函数代表月-日-年时-分。输出是正确的 date-time
class 中的日期。
tidyquant
包有一个方便的函数,as_xts()
,带有一个参数,date_col
,指定时将 data.frame 日期列转换为 xts 行名称。我使用管道 (%>%
) 使代码更具可读性并显示工作流程,以及 dplyr::mutate()
函数将 Date
列更改为 POSIXct
class 使用 lubridate::mdy_hm()
函数。最终的工作流程如下所示:
data %>%
mutate(Date = lubridate::mdy_hm(Date)) %>%
as_xts(date_col = Date)
在尝试代码片段之前,请确保日期列的所有行都具有有效格式,例如 "Jan-05-17 02:55 PM",否则您将在 lubridate::mdy_hm()
函数处遇到解析错误。
我用来测试的数据如下:
data <- structure(list(Date = c("Jan-05-17 02:55 PM", "Jan-05-17 11:40 AM", "Jan-05-17 07:00 AM",
"Dec-30-16 05:50 PM", "Dec-29-16 07:03 AM", "Dec-29-16 07:00 AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%",
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday",
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire",
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%",
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)",
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date",
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")
我有以下 data.frame
,我想将其转换为 xts()
对象,但一直在绞尽脑汁想弄清楚如何格式化时间:
data.frame
数据从最近的(顶部)到最旧的(底部)排列。问题是每一行都与格式不一致,所以我无法尝试以每行显示正确日期和时间的方式对其进行格式化。
Date/Time 列的所需输出:
01/05/17 02:55 PM
01/05/17 11:40 AM
01/05/17 07:00 AM
12/30/16 05:50 PM
12/29/16 07:03 AM
12/30/16 07:00 AM
数据:
data <- structure(list(Date = c("Jan-05-17 02:55PM", "11:40AM", "07:00AM",
"Dec-30-16 05:50PM", "Dec-29-16 07:03AM", "07:00AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%",
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday",
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire",
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%",
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)",
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date",
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")
假设您在所需日期时间输出的最后一行有错字,我猜您的意思是 12/29/16 07:00 AM
,那么当您在列 Date
中有一个元素时缺少日期,取最近的已知日期并滚动 "backwards":
library(stringr)
l_datetime <- str_split(data$Date, " ")
data$ymd <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[1]], NA)))
data$time <- unlist(lapply(l_datetime, function(x) ifelse(length(x) == 2, x[[2]], x[[1]])))
# Roll "backward" the latest known date for elements of column `Date` that have missing YYYY-MM-DD values
data$ymd <- na.locf(data$ymd)
# Carefully parse the time strings allowing for AM/PM:
psx_date <- as.POSIXct(paste(data$ymd, data$time), format = "%b-%d-%y %I:%M%p")
x_data <- xts(x = data[, c("News", "Symbol")], order.by = psx_date)
# > x_data
# News Symbol
# 2016-12-29 07:00:00 "EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire" "ETRM"
# 2016-12-29 07:03:00 "Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)" "ETRM"
# 2016-12-30 17:50:00 "Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%" "ETRM"
# 2017-01-05 07:00:00 "EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire" "ETRM"
# 2017-01-05 11:40:00 "Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday" "ETRM"
# 2017-01-05 14:55:00 "ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%" "ETRM"
使用 sub
将 Date
开头的数字替换为 NA
后跟 space 后跟数字。从那里使用 read.table
创建一个 2 列数据框,其中第 1 列中的日期(或 NA
)和第 2 列中的时间。使用 [=17= 填写 NA
值]给出DF2
。现在 cbind
DF2
和 data[-1]
阅读使用 read.zoo
创建的 data.frame。最后将生成的 "zoo"
对象转换为 "xts"
.
DF2 <- na.locf(read.table(text = sub("^(\d)", "NA \1", data$Date)))
z <- read.zoo(cbind(DF2, data[-1]), index = 1:2, tz = "", format = "%b-%d-%y %I:%M%p")
as.xts(z)
这是一个使用 tidyquant
包的解决方案,它加载了解决此问题所需的所有包。与其他解决方案一样,您需要具有一致的日期结构,例如:
"Jan-05-17 02:55 PM"
使用lubridate
包,可以通过mdy_hm()
函数转换为POSIXct
class,如下所示:
"Jan-05-17 02:55 PM" %>% lubridate::mdy_hm()
> "2017-01-05 14:55:00 UTC"
其中 lubridate::mdy_hm()
函数代表月-日-年时-分。输出是正确的 date-time
class 中的日期。
tidyquant
包有一个方便的函数,as_xts()
,带有一个参数,date_col
,指定时将 data.frame 日期列转换为 xts 行名称。我使用管道 (%>%
) 使代码更具可读性并显示工作流程,以及 dplyr::mutate()
函数将 Date
列更改为 POSIXct
class 使用 lubridate::mdy_hm()
函数。最终的工作流程如下所示:
data %>%
mutate(Date = lubridate::mdy_hm(Date)) %>%
as_xts(date_col = Date)
在尝试代码片段之前,请确保日期列的所有行都具有有效格式,例如 "Jan-05-17 02:55 PM",否则您将在 lubridate::mdy_hm()
函数处遇到解析错误。
我用来测试的数据如下:
data <- structure(list(Date = c("Jan-05-17 02:55 PM", "Jan-05-17 11:40 AM", "Jan-05-17 07:00 AM",
"Dec-30-16 05:50 PM", "Dec-29-16 07:03 AM", "Dec-29-16 07:00 AM"), News = c("ENTEROMEDICS INC Files SEC form 8-K, Other Events, Financial Statements and Exhibits +89.95%",
"Why These 5 Biopharma Stocks Are Making Massive Gains on Thursday",
"EnteroMedics Announces vBloc® Neurometabolic Therapy Now Available at MedStar Health and Roper St. Francis PR Newswire",
"Why U.S. Steel, EnteroMedics, and McEwen Mining Slumped Today at Motley Fool -18.03%",
"Splits Calendar: EnteroMedics splits before market open today (70:1 ratio)",
"EnteroMedics Announces Retirement of All Senior Convertible Notes PR Newswire"
), Symbol = c("ETRM", "ETRM", "ETRM", "ETRM", "ETRM", "ETRM")), .Names = c("Date",
"News", "Symbol"), row.names = c(NA, 6L), class = "data.frame")