如何从每行具有不同列数的文件中读取前四列到数据框中
How to read first four columns from a file with different number of columns on each row into a data frame
我有一个文本文件,其前 10 行如下所示:
3 a 1 4 6 2
3 a 1 4 6 2
4 a 1 4 6 8 2
4 a 1 4 6 8 2
3 a 1 4 6 2
3 a 1 4 6 2
3 a 1 4 6 2
3 a 1 4 6 2
3 a 1 4 6 2
3 a 1 4 6 2
5 a 1 4 8 10 2 6
5 a 2 6 8 10 2 4
5 a 1 4 8 10 2 6
5 a 1 4 8 10 2 6
5 a 2 6 8 10 2 4
我只想读取每一行的前四列并将其保存到数据框中。
我试过几个代码,最后一个是:
library(data.table)
nudos<-fread("caliz.txt",select=c(1:4),fill=TRUE)
不断给出此错误消息:
Stopped early on line 119. Expected 11 fields but found 13. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<10 n 21 4 8 -14 2 -16 -18 -20 -6 -10 -12>>
谢谢!
您的 table 似乎格式不正确。即使您只想 select 前 4 列,R 也会读取所有这些列并且无法处理包含更多或更少元素的行。您必须手动拆分 select 值:
lin = readLines("test.txt")
cells = strsplit(lin," ")
data = c()
for(line in cells){
found = 0
cell = 1
while(found<4){
c = line[[cell]]
print(line)
print(cell)
print(c)
if(nchar(c)>0){
found = found+1
data=c(data,c)
}
cell = cell+1
}
}
df = as.data.frame(matrix(data,ncol=4,byrow=T))
数据框的结果:
> df
V1 V2 V3 V4
1 3 a 1 4
2 3 a 1 4
3 4 a 1 4
4 4 a 1 4
5 3 a 1 4
6 3 a 1 4
7 3 a 1 4
8 3 a 1 4
9 3 a 1 4
10 3 a 1 4
11 5 a 1 4
12 5 a 2 6
13 5 a 1 4
14 5 a 1 4
15 5 a 2 6
您现在可以更改某些列(例如 df[,1] = as.integer(df[,1])
)的对象 class,因为它们目前都是字符。您可能想要获取数值。但这取决于您。
这是一个基本的 R 解决方案。它使用 readLines
读取文件和一系列 *apply
循环来解析它。
# read the file as text lines
txt <- readLines("test.txt")
# split by one or more spaces
txt <- strsplit(txt, " +")
# keep only the vector elements with more than 0 chars
txt <- lapply(txt, function(x) x[sapply(x, nchar) > 0])
# the last line may have a '\n' only, remove it
txt <- txt[lengths(txt) > 0]
# now extract the first 4 elements of each vector
txt <- lapply(txt, '[', 1:4)
# and rbind to data.frame
df1 <- do.call(rbind.data.frame, txt)
names(df1) <- paste0("V", 1:4)
head(df1)
# V1 V2 V3 V4
#1 3 a 1 4
#2 3 a 1 4
#3 4 a 1 4
#4 4 a 1 4
#5 3 a 1 4
#6 3 a 1 4
我有一个文本文件,其前 10 行如下所示:
3 a 1 4 6 2
3 a 1 4 6 2
4 a 1 4 6 8 2
4 a 1 4 6 8 2
3 a 1 4 6 2
3 a 1 4 6 2
3 a 1 4 6 2
3 a 1 4 6 2
3 a 1 4 6 2
3 a 1 4 6 2
5 a 1 4 8 10 2 6
5 a 2 6 8 10 2 4
5 a 1 4 8 10 2 6
5 a 1 4 8 10 2 6
5 a 2 6 8 10 2 4
我只想读取每一行的前四列并将其保存到数据框中。
我试过几个代码,最后一个是:
library(data.table)
nudos<-fread("caliz.txt",select=c(1:4),fill=TRUE)
不断给出此错误消息:
Stopped early on line 119. Expected 11 fields but found 13. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<10 n 21 4 8 -14 2 -16 -18 -20 -6 -10 -12>>
谢谢!
您的 table 似乎格式不正确。即使您只想 select 前 4 列,R 也会读取所有这些列并且无法处理包含更多或更少元素的行。您必须手动拆分 select 值:
lin = readLines("test.txt")
cells = strsplit(lin," ")
data = c()
for(line in cells){
found = 0
cell = 1
while(found<4){
c = line[[cell]]
print(line)
print(cell)
print(c)
if(nchar(c)>0){
found = found+1
data=c(data,c)
}
cell = cell+1
}
}
df = as.data.frame(matrix(data,ncol=4,byrow=T))
数据框的结果:
> df
V1 V2 V3 V4
1 3 a 1 4
2 3 a 1 4
3 4 a 1 4
4 4 a 1 4
5 3 a 1 4
6 3 a 1 4
7 3 a 1 4
8 3 a 1 4
9 3 a 1 4
10 3 a 1 4
11 5 a 1 4
12 5 a 2 6
13 5 a 1 4
14 5 a 1 4
15 5 a 2 6
您现在可以更改某些列(例如 df[,1] = as.integer(df[,1])
)的对象 class,因为它们目前都是字符。您可能想要获取数值。但这取决于您。
这是一个基本的 R 解决方案。它使用 readLines
读取文件和一系列 *apply
循环来解析它。
# read the file as text lines
txt <- readLines("test.txt")
# split by one or more spaces
txt <- strsplit(txt, " +")
# keep only the vector elements with more than 0 chars
txt <- lapply(txt, function(x) x[sapply(x, nchar) > 0])
# the last line may have a '\n' only, remove it
txt <- txt[lengths(txt) > 0]
# now extract the first 4 elements of each vector
txt <- lapply(txt, '[', 1:4)
# and rbind to data.frame
df1 <- do.call(rbind.data.frame, txt)
names(df1) <- paste0("V", 1:4)
head(df1)
# V1 V2 V3 V4
#1 3 a 1 4
#2 3 a 1 4
#3 4 a 1 4
#4 4 a 1 4
#5 3 a 1 4
#6 3 a 1 4