注释字符与 headers 冲突以使用 read.table 导入 DF
Conflict between comment character and headers to import DF with read.table
如何导入文件:
- 从未定义数量的评论行开始
- 后跟一行headers,其中一些包含用于标识上面注释行的注释字符?
例如,使用这样的文件:
# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8
然后:
myDF = read.table(myfile, sep=',', header=T)
Error in read.table(myfile, sep = ",", header = T) : more columns
than column names
明显的问题是 #
被用作注释字符来宣布注释行,而且在 headers 中(无可否认,这是不好的做法,但我对此无法控制).
注释行数未知先验,我什至不能使用skip
参数。另外,在导入之前我不知道列名(甚至不知道它们的编号),所以我真的需要从文件中读取它们。
除了手动操作文件之外还有什么解决方案吗?
计算以注释开头的行数然后跳过它们可能很容易。
csvfile <- "# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8"
# return a logical for whether the line starts with a comment.
# remove everything from the first FALSE and afterward
# take the sum of what's left
start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))
# skip the lines that start with the comment character
Data <- read.csv(textConnection(csvfile),
skip = start_comment,
stringsAsFactors = FALSE)
请注意,这将适用于 read.csv
,因为在 read.csv
、comment.char = ""
中。如果您必须使用 read.table
,或者必须使用 comment.char = #
,您可能需要更多步骤。
start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))
# Get the headers by themselves.
Head <- read.table(textConnection(csvfile),
skip = start_comment,
header = FALSE,
sep = ",",
comment.char = "",
nrows = 1)
Data <- read.table(textConnection(csvfile),
sep = ",",
header = FALSE,
skip = start_comment + 1,
stringsAsFactors = FALSE)
# apply column names to Data
names(Data) <- unlist(Head)
如何导入文件:
- 从未定义数量的评论行开始
- 后跟一行headers,其中一些包含用于标识上面注释行的注释字符?
例如,使用这样的文件:
# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8
然后:
myDF = read.table(myfile, sep=',', header=T)
Error in read.table(myfile, sep = ",", header = T) : more columns than column names
明显的问题是 #
被用作注释字符来宣布注释行,而且在 headers 中(无可否认,这是不好的做法,但我对此无法控制).
注释行数未知先验,我什至不能使用skip
参数。另外,在导入之前我不知道列名(甚至不知道它们的编号),所以我真的需要从文件中读取它们。
除了手动操作文件之外还有什么解决方案吗?
计算以注释开头的行数然后跳过它们可能很容易。
csvfile <- "# comment 1
# ...
# comment X
c01,c#02,c03,c04
1,2,3,4
5,6,7,8"
# return a logical for whether the line starts with a comment.
# remove everything from the first FALSE and afterward
# take the sum of what's left
start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))
# skip the lines that start with the comment character
Data <- read.csv(textConnection(csvfile),
skip = start_comment,
stringsAsFactors = FALSE)
请注意,这将适用于 read.csv
,因为在 read.csv
、comment.char = ""
中。如果您必须使用 read.table
,或者必须使用 comment.char = #
,您可能需要更多步骤。
start_comment <- grepl("^#", readLines(textConnection(csvfile)))
start_comment <- sum(head(start_comment, which(!start_comment)[1] - 1))
# Get the headers by themselves.
Head <- read.table(textConnection(csvfile),
skip = start_comment,
header = FALSE,
sep = ",",
comment.char = "",
nrows = 1)
Data <- read.table(textConnection(csvfile),
sep = ",",
header = FALSE,
skip = start_comment + 1,
stringsAsFactors = FALSE)
# apply column names to Data
names(Data) <- unlist(Head)