R中是否有用于将列(数据框或table)自动转换为其原始向量类型的东西

Is there something in R for automatic conversion of column( of data frame or table) into its original vector type

实际上我担心的是数据,它是如何以不同的向量类型出现的。有些列本来是整数或数字类型,但显示为字符类型。

如果我通过 read.csv() 读取数据框,它会猜测是哪种类型的向量并自动转换它们。我找不到与 fread()data.table() 相同的内容。数据附在这里

structure(list(V1 = c("1", "2", "3", "4", "5", "6"), ID = c("109", 
"110", "111", "112", "113", "114"), SignalIntensity = c(7.58043495940162, 
11.2698560261255, 8.60063586764357, 9.54355755391806, 10.1812351379984, 
8.11689493952339), SNR = c(1.34218273720186, 9.75097840763912, 
1.80485348504829, 3.20137685049428, 4.64599368338536, 1.42263609838542
)), .Names = c("V1", "ID", "SignalIntensity", "SNR"), row.names = c(NA, 
6L), class = "data.frame")

当我使用 read.csv()

读取数据帧时
str(df)

data.frame':    20469 obs. of  4 variables:
 $ X              : int  1 2 3 4 5 6 7 8 9 10 ...
 $ ID             : int  109 110 111 112 113 114 116 117 118 119 ...
 $ SignalIntensity: num  6.18 10.17 7.29 8.9 9.59 ...
 $ SNR            : num  0.845 4.384 1.073 2.319 3.713 ...

fread() 和 read.table()

读取相同的数据帧
'data.frame':   20469 obs. of  4 variables:
 $ V1             : chr  "1" "2" "3" "4" ...
 $ ID             : chr  "109" "110" "111" "112" ...
 $ SignalIntensity: num  6.18 10.17 7.29 8.9 9.59 ...
 $ SNR            : num  0.845 4.384 1.073 2.319 3.713 ...


read.table()
'data.frame':   20470 obs. of  2 variables:
 $ V1: int  NA 1 2 3 4 5 6 7 8 9 ...
 $ V2: chr  ",\"ID\",\"SignalIntensity\",\"SNR\"" ",\"109\",6.18230893141024,0.845357691456258" ",\"110\",10.1727771385494,4.38370775906105" ",\"111\",7.29227469267823,1.07257511609212" ...

我想知道任何占用丢失原始矢量类型数据的所有开销的信息。除了 read.csv()??

之外的任何自动转换

编辑: fread(....,verbose=TRUE)

Input contains no \n. Taking this to be a filename to open
File opened, filesize is 0.000949 GB.
Memory mapping ... ok
Detected eol as \r\n (CRLF) in that order, the Windows standard.
Using line 30 to detect sep (the last non blank line in the first 'autostart') ... sep=','
Found 4 columns
First row with 4 fields occurs on line 1 (either column names or first row of data)
All the fields on line 1 are character fields. Treating as the column names.
Count of eol after first data row: 20470
Subtracted 1 for last eol and any trailing empty lines, leaving 20469 data rows
Type codes (   first 5 rows): 4433
Type codes (+ middle 5 rows): 4433
Type codes (+   last 5 rows): 4433
Type codes: 4433 (after applying colClasses and integer64)
Type codes: 4433 (after applying drop or select (if supplied)
Allocating 4 column slots (4 - 0 dropped)
   0.001s (  2%) Memory map (rerun may be quicker)
   0.000s (  1%) sep and header detection
   0.004s ( 12%) Count rows (wc -l)
   0.001s (  2%) Column type detection (first, middle and last 5 rows)
   0.000s (  0%) Allocation of 20469x4 result (xMB) in RAM
   0.025s ( 82%) Reading data
   0.000s (  0%) Allocation for type bumps (if any), including gc time if triggered
   0.000s (  0%) Coercing data already read in type bumps (if any)
   0.000s (  0%) Changing na.strings to NA
   0.030s        Total

fread 中设置 colClasses 似乎存在一些错误(?)(我将等待 @Arun 的回复)。同时,您可以使用 type.convert after 读取数据并重新分配列 by reference

来解决此问题
indx <- which(sapply(df, is.character))
df[, (indx) := lapply(.SD, type.convert), .SDcols = indx]
str(df)
# Classes ‘data.table’ and 'data.frame':  6 obs. of  4 variables:
# $ V1             : int  1 2 3 4 5 6
# $ ID             : int  109 110 111 112 113 114
# $ SignalIntensity: num  7.58 11.27 8.6 9.54 10.18 ...
# $ SNR            : num  1.34 9.75 1.8 3.2 4.65 ...
# - attr(*, ".internal.selfref")=<externalptr>