为什么 fread 将回车 returns (\r) 插入 data.table?
Why is fread inserting carriage returns (\r) into data.table?
我正在 Windows 10 计算机上使用 data.table::fread
从 .csv 文件中读取数据。数据通过read.csv
正确读入;然而,当我使用 fread
读取数据时,结果 data.table 每一行的所有最后一列都以 \r
结尾,大概表示一个回车 return .这会导致数字字段被赋予字符数据类型。 (行结束单元格将包含字符文字 4.53\r
,而不是数字文字 4.53
。)
为什么会出现这个错误?有没有办法通过fread
?
的函数调用直接解决
更新
我在使用 verbose = TRUE
参数时得到以下结果
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000001 GB.
Memory mapping ... ok
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ','
Detected 7 columns. Longest stretch was from line 1 to line 13
Starting data input on line 1 (either column names or first row of data). First 10 characters: subjectNum
All the fields on line 1 are character fields. Treating as the column names.
Count of eol: 13 (including 1 at the end)
Count of sep: 72
nrow = MIN( nsep [72] / ncol [7] -1, neol [13] - nblank [1] ) = 12
Type codes ( first 5 rows): 1131414
Type codes: 1131414 (after applying colClasses and integer64)
Type codes: 1131414 (after applying drop or select (if supplied)
Allocating 7 column slots (7 - 0 dropped)
Read 12 rows. Exactly what was estimated and allocated up front
0.000s ( 0%) Memory map (rerun may be quicker)
0.001s ( 33%) sep and header detection
0.000s ( 0%) Count rows (wc -l)
0.002s ( 67%) Column type detection (first, middle and last 5 rows)
0.000s ( 0%) Allocation of 12x7 result (xMB) in RAM
0.000s ( 0%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.000s ( 0%) Changing na.strings to NA
0.003s Total
如果您有一个看起来像 x="a\n1\r\n2\r\n"
的文件,那么 fread(x)
将给出描述的结果:
a
1: 1\r
2: 2\r
出现这种情况是因为各行的行尾指示符不一致。
我听说其他人也遇到过这种情况,但我不确定它是从哪里来的,也不确定是否有比 "fixing" 文件更好的解决方法,可能是使用命令行工具.
我正在 Windows 10 计算机上使用 data.table::fread
从 .csv 文件中读取数据。数据通过read.csv
正确读入;然而,当我使用 fread
读取数据时,结果 data.table 每一行的所有最后一列都以 \r
结尾,大概表示一个回车 return .这会导致数字字段被赋予字符数据类型。 (行结束单元格将包含字符文字 4.53\r
,而不是数字文字 4.53
。)
为什么会出现这个错误?有没有办法通过fread
?
更新
我在使用 verbose = TRUE
参数时得到以下结果
Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000001 GB.
Memory mapping ... ok
Detected eol as \n only (no \r afterwards), the UNIX and Mac standard.
Positioned on line 1 after skip or autostart
This line is the autostart and not blank so searching up for the last non-blank ... line 1
Detecting sep ... ','
Detected 7 columns. Longest stretch was from line 1 to line 13
Starting data input on line 1 (either column names or first row of data). First 10 characters: subjectNum
All the fields on line 1 are character fields. Treating as the column names.
Count of eol: 13 (including 1 at the end)
Count of sep: 72
nrow = MIN( nsep [72] / ncol [7] -1, neol [13] - nblank [1] ) = 12
Type codes ( first 5 rows): 1131414
Type codes: 1131414 (after applying colClasses and integer64)
Type codes: 1131414 (after applying drop or select (if supplied)
Allocating 7 column slots (7 - 0 dropped)
Read 12 rows. Exactly what was estimated and allocated up front
0.000s ( 0%) Memory map (rerun may be quicker)
0.001s ( 33%) sep and header detection
0.000s ( 0%) Count rows (wc -l)
0.002s ( 67%) Column type detection (first, middle and last 5 rows)
0.000s ( 0%) Allocation of 12x7 result (xMB) in RAM
0.000s ( 0%) Reading data
0.000s ( 0%) Allocation for type bumps (if any), including gc time if triggered
0.000s ( 0%) Coercing data already read in type bumps (if any)
0.000s ( 0%) Changing na.strings to NA
0.003s Total
如果您有一个看起来像 x="a\n1\r\n2\r\n"
的文件,那么 fread(x)
将给出描述的结果:
a
1: 1\r
2: 2\r
出现这种情况是因为各行的行尾指示符不一致。
我听说其他人也遇到过这种情况,但我不确定它是从哪里来的,也不确定是否有比 "fixing" 文件更好的解决方法,可能是使用命令行工具.