如何从每行具有不同列数的文件中读取前四列到数据框中

How to read first four columns from a file with different number of columns on each row into a data frame

我有一个文本文件,其前 10 行如下所示:

 3  a         1       4   6   2
 3  a         1       4   6   2
 4  a         1       4   6   8   2
 4  a         1       4   6   8   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 3  a         1       4   6   2
 5  a         1       4   8  10   2   6
 5  a         2       6   8  10   2   4
 5  a         1       4   8  10   2   6
 5  a         1       4   8  10   2   6
 5  a         2       6   8  10   2   4

我只想读取每一行的前四列并将其保存到数据框中。

我试过几个代码,最后一个是:

library(data.table)

nudos<-fread("caliz.txt",select=c(1:4),fill=TRUE)

不断给出此错误消息:

Stopped early on line 119. Expected 11 fields but found 13. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<10 n 21 4 8 -14 2 -16 -18 -20 -6 -10 -12>>

谢谢!

您的 table 似乎格式不正确。即使您只想 select 前 4 列,R 也会读取所有这些列并且无法处理包含更多或更少元素的行。您必须手动拆分 select 值:

lin = readLines("test.txt")
cells = strsplit(lin," ")
data = c()
for(line in cells){
  found = 0
  cell = 1
  while(found<4){
    c = line[[cell]]
    print(line)
    print(cell)
    print(c)
    if(nchar(c)>0){
      found = found+1
      data=c(data,c)
    }
    cell = cell+1
  }
}

df = as.data.frame(matrix(data,ncol=4,byrow=T))

数据框的结果:

> df
   V1 V2 V3 V4
1   3  a  1  4
2   3  a  1  4
3   4  a  1  4
4   4  a  1  4
5   3  a  1  4
6   3  a  1  4
7   3  a  1  4
8   3  a  1  4
9   3  a  1  4
10  3  a  1  4
11  5  a  1  4
12  5  a  2  6
13  5  a  1  4
14  5  a  1  4
15  5  a  2  6

您现在可以更改某些列(例如 df[,1] = as.integer(df[,1]))的对象 class,因为它们目前都是字符。您可能想要获取数值。但这取决于您。

这是一个基本的 R 解决方案。它使用 readLines 读取文件和一系列 *apply 循环来解析它。

# read the file as text lines
txt <- readLines("test.txt")
# split by one or more spaces
txt <- strsplit(txt, " +")
# keep only the vector elements with more than 0 chars
txt <- lapply(txt, function(x) x[sapply(x, nchar) > 0])
# the last line may have a '\n' only, remove it
txt <- txt[lengths(txt) > 0]
# now extract the first 4 elements of each vector
txt <- lapply(txt, '[', 1:4)
# and rbind to data.frame
df1 <- do.call(rbind.data.frame, txt)
names(df1) <- paste0("V", 1:4)

head(df1)
#  V1 V2 V3 V4
#1  3  a  1  4
#2  3  a  1  4
#3  4  a  1  4
#4  4  a  1  4
#5  3  a  1  4
#6  3  a  1  4