双引号和 fread 函数的问题
Issue with double quotes and fread function
我有一些列条目如下所示:
c("This is just a "shame"...") # since its a character
这将在您的 C:\ 驱动器上写入一个文件:
sample.data <- data.frame(case1=c("This is just a 'shame'..."),
case2="This is just a shame") # here I could not make it to insert the double quotes
write.csv(sample.data, file="C:/sample_data.csv")
require(data.table)
test.fread <- fread("C:/sample_data.csv")
test.read.csv <- read.csv("C:/sample_data.csv")
如果我用 fread
函数读取 csv
数据(来自 data.table),我得到他的错误:
Bumped column 79 to type character on data row 12681, field contains '
a.n."'. Coercing previously read values in this column from logical,
integer or numeric back to character which may not be lossless; e.g., if
'00' and '000' occurred before they will now be just '0', and there
may be inconsistencies with treatment of ',,' and ',NA,' too (if they
occurred in this column before the bump). If this matters please rerun
and set 'colClasses' to 'character' for this column. Please note that column
type detection uses the first 5 rows, the middle 5 rows and the
last 5 rows, so hopefully this message should be very rare.
If reporting to datatable-help, please rerun and include
the output from verbose=TRUE.
如果我使用 read.csv
不会发生错误并且条目被正确读入!
问题 1:如何删除 character
名称中的双引号。
问题 2:为什么 read.csv 正确读取条目但 fread
失败?
正如@Arun 善意建议的那样,github 当前的 data.table
开发版本 1.9.5 可能会有所帮助。
要安装,请按照以下步骤进行(需要 Rtools):
# To install development version
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
已经过测试,所以这是为了确认最新版本的data.table
可以毫无问题地解决双引号问题。
有关更多详细信息和更新,请查看以下内容 link github data.table
我有一些列条目如下所示:
c("This is just a "shame"...") # since its a character
这将在您的 C:\ 驱动器上写入一个文件:
sample.data <- data.frame(case1=c("This is just a 'shame'..."),
case2="This is just a shame") # here I could not make it to insert the double quotes
write.csv(sample.data, file="C:/sample_data.csv")
require(data.table)
test.fread <- fread("C:/sample_data.csv")
test.read.csv <- read.csv("C:/sample_data.csv")
如果我用 fread
函数读取 csv
数据(来自 data.table),我得到他的错误:
Bumped column 79 to type character on data row 12681, field contains '
a.n."'. Coercing previously read values in this column from logical,
integer or numeric back to character which may not be lossless; e.g., if
'00' and '000' occurred before they will now be just '0', and there
may be inconsistencies with treatment of ',,' and ',NA,' too (if they
occurred in this column before the bump). If this matters please rerun
and set 'colClasses' to 'character' for this column. Please note that column
type detection uses the first 5 rows, the middle 5 rows and the
last 5 rows, so hopefully this message should be very rare.
If reporting to datatable-help, please rerun and include
the output from verbose=TRUE.
如果我使用 read.csv
不会发生错误并且条目被正确读入!
问题 1:如何删除 character
名称中的双引号。
问题 2:为什么 read.csv 正确读取条目但 fread
失败?
正如@Arun 善意建议的那样,github 当前的 data.table
开发版本 1.9.5 可能会有所帮助。
要安装,请按照以下步骤进行(需要 Rtools):
# To install development version
library(devtools)
install_github("Rdatatable/data.table", build_vignettes = FALSE)
已经过测试,所以这是为了确认最新版本的data.table
可以毫无问题地解决双引号问题。
有关更多详细信息和更新,请查看以下内容 link github data.table