从 R 中的文件夹中读取多个 space 分隔的文本文件
Reading multiple space-delimited text files from a folder in R
我在一个文件夹中有 100 个 space-delimited
个文本文件。每个文本文件中都有一段文本。我希望提取数据框中的数据,其中 column 1
作为 File ID
和 Column 2
作为相应的文本段落。
这是我迄今为止尝试过的方法,但未能以所需格式提取文本段落。
lf <- list.files(path = "", pattern = "'*.txt", full.names = TRUE, recursive = TRUE, include.dirs = TRUE)
data <- lapply(lf, read.table, sep="", header=FALSE)
示例文本文件如下所示:
"Yeah, and and repeated phone calls is I call in on something I continuously ask if there's a promotional deal going on Dvr's because I've had some problems with the hopper and the delays and today. I get another bill or exchanging hopper enjoys better for Dvr's."
我得到的输出是一个列表:
[[1]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
1 Yeah, and and repeated phone calls is I call in on something I continuously ask if there's
V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33
1 a promotional deal going on Dvr's because I've had some problems with the hopper and the
V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49
1 delays and today. I get another bill or exchanging hopper enjoys better for Dvr's.
我希望以数据帧格式获取它:
File ID Text
file1.txt Yeah, and and repeated phone calls...
关于我遗漏的任何指示?
提前致谢。
试试这个:(您不想将空格作为分隔符,因为您的段落中有很多空格):
dat <- setNames( lapply(lf, read.table, sep="|", header=FALSE), lf)
选择一个您怀疑不会出现在文本中的分隔符。恐怕 sep=""
是一个糟糕的选择,因为它被解释为 read.table 的默认值,即 "whitespace"。每个文件条目的"title"应该是文件名。
我在一个文件夹中有 100 个 space-delimited
个文本文件。每个文本文件中都有一段文本。我希望提取数据框中的数据,其中 column 1
作为 File ID
和 Column 2
作为相应的文本段落。
这是我迄今为止尝试过的方法,但未能以所需格式提取文本段落。
lf <- list.files(path = "", pattern = "'*.txt", full.names = TRUE, recursive = TRUE, include.dirs = TRUE)
data <- lapply(lf, read.table, sep="", header=FALSE)
示例文本文件如下所示:
"Yeah, and and repeated phone calls is I call in on something I continuously ask if there's a promotional deal going on Dvr's because I've had some problems with the hopper and the delays and today. I get another bill or exchanging hopper enjoys better for Dvr's."
我得到的输出是一个列表:
[[1]]
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17
1 Yeah, and and repeated phone calls is I call in on something I continuously ask if there's
V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32 V33
1 a promotional deal going on Dvr's because I've had some problems with the hopper and the
V34 V35 V36 V37 V38 V39 V40 V41 V42 V43 V44 V45 V46 V47 V48 V49
1 delays and today. I get another bill or exchanging hopper enjoys better for Dvr's.
我希望以数据帧格式获取它:
File ID Text
file1.txt Yeah, and and repeated phone calls...
关于我遗漏的任何指示?
提前致谢。
试试这个:(您不想将空格作为分隔符,因为您的段落中有很多空格):
dat <- setNames( lapply(lf, read.table, sep="|", header=FALSE), lf)
选择一个您怀疑不会出现在文本中的分隔符。恐怕 sep=""
是一个糟糕的选择,因为它被解释为 read.table 的默认值,即 "whitespace"。每个文件条目的"title"应该是文件名。