如何在R中用分号分割数据框中的列
How to split columns in data frame by semicolon in R
我觉得我的问题太明显了,但是我找不到解决方案。
a 有这样一个数据框:
<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>
USD Index;D;20150801;000000;97.199;97.336;97.191;97.192
USD Index;D;20150802;000000;97.226;97.294;97.207;97.257
USD Index;D;20150803;000000;97.255;97.582;97.155;97.499
我需要按 ; 将它们分成不同的列像这样:
<TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
USD Index D 20150801 0 97.199 97.336 97.191 97.192
USD Index D 20150802 0 97.226 97.294 97.207 97.257
USD Index D 20150803 0 97.255 97.582 97.155 97.499
这是一个基本问题,需要放在搜索结果的顶部。预先感谢您帮助我解决这个问题!
我们可以使用read.table
setNames(read.table(text=dat[,1], sep=";", stringsAsFactors=FALSE),
scan(text=names(dat), sep=";", what = "", quiet = TRUE))
# <TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
# 1 USD Index D 20150801 0 97.199 97.336 97.191 97.192
# 2 USD Index D 20150802 0 97.226 97.294 97.207 97.257
# 3 USD Index D 20150803 0 97.255 97.582 97.155 97.499
数据
dat <- structure(list(`<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>` =
c("USD Index;D;20150801;000000;97.199;97.336;97.191;97.192",
"USD Index;D;20150802;000000;97.226;97.294;97.207;97.257",
"USD Index;D;20150803;000000;97.255;97.582;97.155;97.499"
)), .Names = "<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>",
class = "data.frame", row.names = c(NA, -3L))
使用 fread()
这非常容易。使用 akrun 的 dat
,我们有
data.table::fread(paste(c(names(dat), dat[[1]]), collapse = "\n"))
# <TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
# 1: USD Index D 20150801 0 97.199 97.336 97.191 97.192
# 2: USD Index D 20150802 0 97.226 97.294 97.207 97.257
# 3: USD Index D 20150803 0 97.255 97.582 97.155 97.499
对于数据帧结果,只需在 fread()
调用中添加 data.table = FALSE
。
或者,tstrsplit()
可用于拆分列,setnames()
可重命名列:
library(data.table)
setDT(dat)[, tstrsplit(.SD[[1]], ";")][, setnames(.SD, strsplit(names(dat), ";")[[1]])]
<TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
1: USD Index D 20150801 000000 97.199 97.336 97.191 97.192
2: USD Index D 20150802 000000 97.226 97.294 97.207 97.257
3: USD Index D 20150803 000000 97.255 97.582 97.155 97.499
请注意,<TICKER>
等 不是 语法上有效的列名称,需要在许多地方进行转义。因此,我建议像这样去掉尖括号:
setDT(dat)[, tstrsplit(.SD[[1]], ";")][
, setnames(.SD, gsub("[<>]", "", strsplit(names(dat), ";")[[1]]))]
TICKER PER DATE TIME OPEN HIGH LOW CLOSE
1: USD Index D 20150801 000000 97.199 97.336 97.191 97.192
2: USD Index D 20150802 000000 97.226 97.294 97.207 97.257
3: USD Index D 20150803 000000 97.255 97.582 97.155 97.499
我觉得我的问题太明显了,但是我找不到解决方案。
a 有这样一个数据框:
<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>
USD Index;D;20150801;000000;97.199;97.336;97.191;97.192
USD Index;D;20150802;000000;97.226;97.294;97.207;97.257
USD Index;D;20150803;000000;97.255;97.582;97.155;97.499
我需要按 ; 将它们分成不同的列像这样:
<TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
USD Index D 20150801 0 97.199 97.336 97.191 97.192
USD Index D 20150802 0 97.226 97.294 97.207 97.257
USD Index D 20150803 0 97.255 97.582 97.155 97.499
这是一个基本问题,需要放在搜索结果的顶部。预先感谢您帮助我解决这个问题!
我们可以使用read.table
setNames(read.table(text=dat[,1], sep=";", stringsAsFactors=FALSE),
scan(text=names(dat), sep=";", what = "", quiet = TRUE))
# <TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
# 1 USD Index D 20150801 0 97.199 97.336 97.191 97.192
# 2 USD Index D 20150802 0 97.226 97.294 97.207 97.257
# 3 USD Index D 20150803 0 97.255 97.582 97.155 97.499
数据
dat <- structure(list(`<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>` =
c("USD Index;D;20150801;000000;97.199;97.336;97.191;97.192",
"USD Index;D;20150802;000000;97.226;97.294;97.207;97.257",
"USD Index;D;20150803;000000;97.255;97.582;97.155;97.499"
)), .Names = "<TICKER>;<PER>;<DATE>;<TIME>;<OPEN>;<HIGH>;<LOW>;<CLOSE>",
class = "data.frame", row.names = c(NA, -3L))
使用 fread()
这非常容易。使用 akrun 的 dat
,我们有
data.table::fread(paste(c(names(dat), dat[[1]]), collapse = "\n"))
# <TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE>
# 1: USD Index D 20150801 0 97.199 97.336 97.191 97.192
# 2: USD Index D 20150802 0 97.226 97.294 97.207 97.257
# 3: USD Index D 20150803 0 97.255 97.582 97.155 97.499
对于数据帧结果,只需在 fread()
调用中添加 data.table = FALSE
。
或者,tstrsplit()
可用于拆分列,setnames()
可重命名列:
library(data.table)
setDT(dat)[, tstrsplit(.SD[[1]], ";")][, setnames(.SD, strsplit(names(dat), ";")[[1]])]
<TICKER> <PER> <DATE> <TIME> <OPEN> <HIGH> <LOW> <CLOSE> 1: USD Index D 20150801 000000 97.199 97.336 97.191 97.192 2: USD Index D 20150802 000000 97.226 97.294 97.207 97.257 3: USD Index D 20150803 000000 97.255 97.582 97.155 97.499
请注意,<TICKER>
等 不是 语法上有效的列名称,需要在许多地方进行转义。因此,我建议像这样去掉尖括号:
setDT(dat)[, tstrsplit(.SD[[1]], ";")][
, setnames(.SD, gsub("[<>]", "", strsplit(names(dat), ";")[[1]]))]
TICKER PER DATE TIME OPEN HIGH LOW CLOSE 1: USD Index D 20150801 000000 97.199 97.336 97.191 97.192 2: USD Index D 20150802 000000 97.226 97.294 97.207 97.257 3: USD Index D 20150803 000000 97.255 97.582 97.155 97.499