R 中具有不定数量变量的 categoricalToNumeric 函数(Variadic 函数)

categoricalToNumeric function in R with indefinite quantity of variables (Variadic function)

我想用函数执行以下操作:

categoricalToNumeric <- function(data,...) {
    for(i in list(...)) {
      data$i <- as.numeric(as.factor(data$i))
    }
  summary(data)
}

然后打电话,

categoricalToNumeric(data, 'school', 'sex', 'address', 'famsize', 'Pstatus', 'Mjob', 'Fjob', 'reason', 'nursery', 'internet', 'guardian.x', 'schoolsup.x', 'famsup.x', 'paid.x', 'activities.x', 'higher.x', 'romantic.x', 'guardian.y', 'schoolsup.y', 'famsup.y', 'paid.y', 'activities.y', 'higher.y', 'romantic.y')

目前没有错误,但数据变量在 categoricalToNumeric 调用时没有发生变化。

数据:https://archive.ics.uci.edu/ml/machine-learning-databases/00320/student.zip

设置:

data_mat=read.table("./data/csv/student-mat.csv",sep=";",header=TRUE)
data_por=read.table("./data/csv/student-por.csv",sep=";",header=TRUE)


data=merge(data_mat,data_por,by=c("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
print(nrow(data)) # 382 data

head(data,5)

这很奇怪,但这很管用。为了方便起见,我将 ... 更改为 colnames

categoricalToNumeric2 <- function(data,...) {
  for(i in colnames(data)) {
    data[i] <- as.numeric(as.factor(data$i))
  }
  summary(data)
}
categoricalToNumeric2(data)

    school           sex             age           address         famsize         Pstatus           Medu            Fedu      
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
      Mjob            Fjob           reason         nursery         internet       guardian.x     traveltime.x    studytime.x   
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
   failures.x     schoolsup.x       famsup.x         paid.x       activities.x      higher.x       romantic.x       famrel.x    
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
   freetime.x       goout.x          Dalc.x          Walc.x         health.x       absences.x         G1.x            G2.x      
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
      G3.x         guardian.y     traveltime.y    studytime.y      failures.y     schoolsup.y       famsup.y         paid.y     
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
  activities.y      higher.y       romantic.y       famrel.y       freetime.y       goout.y          Dalc.y          Walc.y     
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  
    health.y       absences.y         G1.y            G2.y            G3.y      
 Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
 1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000   1st Qu.:2.000  
 Median :2.000   Median :2.000   Median :2.000   Median :2.000   Median :2.000  
 Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848   Mean   :1.848  
 3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000   3rd Qu.:2.000  
 Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000   Max.   :2.000  

data$i 不是在循环中提取列的有效方法。您可以对单列使用 [[ 或对多列使用 [for 循环的替代方法是使用 lapply.

categoricalToNumeric <- function(data,...) {
  cols <- c(...)
  data[cols] <- lapply(data[cols], function(x) as.numeric(as.factor(x)))
  summary(data)
}

categoricalToNumeric(data, 'school', 'sex', ...rest of the columns)