如何测试字符数据框中的数值,并将其转换为数字?

How do I test for numeric values in a dataframe of characters, and convert those to numeric?

我有一个类似于以下的数据框:

> theDF
   ID        Ticker INDUSTRY_SECTOR              VAR             CVAR
1   1      USD CASH                                0                0
12  2      ZAR CASH                 -181412.82055904 -301731.22832191
23  3 BAT SJ EQUITY       Financial  61711.951234826 102641.162795691
34  4 HCI SJ EQUITY       Financial 1095.16002541256 1821.50290513369
45  5 PSG SJ EQUITY       Financial 16498.2192382422  27440.331617902

我们可以看到这些都是字符列:

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 

我想要只将数字类型向量更改为数字的东西。基本上,如果它 "looks like" 是一个数字,就把它变成数字,否则就让它保持不变。我在 Whosebug 上找不到任何不需要事先知道要转换的名称或列的内容。这个 DF 不会总是以相同的顺序排列,或者有列,所以我需要一些动态的方法来检查列 "look like" 是否为数字并将这些列设为数字。

这(显然)给了我一堆 NA;s 字符列:

> apply(theDF, 2, as.numeric)
     ID Ticker INDUSTRY_SECTOR        VAR        CVAR
[1,]  1     NA              NA       0.00       0.000
[2,]  2     NA              NA -181412.82 -301731.228
[3,]  3     NA              NA   61711.95  102641.163
[4,]  4     NA              NA    1095.16    1821.503
[5,]  5     NA              NA   16498.22   27440.332

我试过类似的方法,但它不仅不起作用,而且效率似乎非常低:

> apply(theDF, 2, function(x) tryCatch(as.numeric(x),error=function(e) e, warning=function(w) x))
     ID  Ticker          INDUSTRY_SECTOR VAR                CVAR              
[1,] "1" "USD CASH"      ""              "0"                "0"               
[2,] "2" "ZAR CASH"      ""              "-181412.82055904" "-301731.22832191"
[3,] "3" "BAT SJ EQUITY" "Financial"     "61711.951234826"  "102641.162795691"
[4,] "4" "HCI SJ EQUITY" "Financial"     "1095.16002541256" "1821.50290513369"
[5,] "5" "PSG SJ EQUITY" "Financial"     "16498.2192382422" "27440.331617902" 

有更好的方法吗?

编辑: 人们一直在要求这个,所以这里...

> apply(theDF, 2, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, mode)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> apply(theDF, 2, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 
> sapply(theDF, class)
             ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
    "character"     "character"     "character"     "character"     "character" 

看起来像是 type.convert() 的工作。

theDF[] <- lapply(theDF, type.convert, as.is = TRUE)
## check the result
sapply(theDF, class)
#          ID          Ticker INDUSTRY_SECTOR             VAR            CVAR 
#   "integer"     "character"     "character"       "numeric"       "numeric" 

type.convert() 将向量强制转换为其 "most appropriate" 类型。设置 as.is = TRUE 允许我们保持字符原样,否则它们将被强制转换为因子。

更新:对于非字符列,需要先将它们强制转换为字符。

theDF[] <- lapply(theDF, function(x) type.convert(as.character(x), as.is = TRUE))