R 中具有不定数量变量的 categoricalToNumeric 函数(Variadic 函数)
categoricalToNumeric function in R with indefinite quantity of variables (Variadic function)
我想用函数执行以下操作:
categoricalToNumeric <- function(data,...) {
for(i in list(...)) {
data$i <- as.numeric(as.factor(data$i))
}
summary(data)
}
然后打电话,
categoricalToNumeric(data, 'school', 'sex', 'address', 'famsize', 'Pstatus', 'Mjob', 'Fjob', 'reason', 'nursery', 'internet', 'guardian.x', 'schoolsup.x', 'famsup.x', 'paid.x', 'activities.x', 'higher.x', 'romantic.x', 'guardian.y', 'schoolsup.y', 'famsup.y', 'paid.y', 'activities.y', 'higher.y', 'romantic.y')
目前没有错误,但数据变量在 categoricalToNumeric
调用时没有发生变化。
数据:https://archive.ics.uci.edu/ml/machine-learning-databases/00320/student.zip
设置:
data_mat=read.table("./data/csv/student-mat.csv",sep=";",header=TRUE)
data_por=read.table("./data/csv/student-por.csv",sep=";",header=TRUE)
data=merge(data_mat,data_por,by=c("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
print(nrow(data)) # 382 data
head(data,5)
这很奇怪,但这很管用。为了方便起见,我将 ...
更改为 colnames
categoricalToNumeric2 <- function(data,...) {
for(i in colnames(data)) {
data[i] <- as.numeric(as.factor(data$i))
}
summary(data)
}
categoricalToNumeric2(data)
school sex age address famsize Pstatus Medu Fedu
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
Mjob Fjob reason nursery internet guardian.x traveltime.x studytime.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
failures.x schoolsup.x famsup.x paid.x activities.x higher.x romantic.x famrel.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
freetime.x goout.x Dalc.x Walc.x health.x absences.x G1.x G2.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
G3.x guardian.y traveltime.y studytime.y failures.y schoolsup.y famsup.y paid.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
activities.y higher.y romantic.y famrel.y freetime.y goout.y Dalc.y Walc.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
health.y absences.y G1.y G2.y G3.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
data$i
不是在循环中提取列的有效方法。您可以对单列使用 [[
或对多列使用 [
。 for
循环的替代方法是使用 lapply
.
categoricalToNumeric <- function(data,...) {
cols <- c(...)
data[cols] <- lapply(data[cols], function(x) as.numeric(as.factor(x)))
summary(data)
}
categoricalToNumeric(data, 'school', 'sex', ...rest of the columns)
我想用函数执行以下操作:
categoricalToNumeric <- function(data,...) {
for(i in list(...)) {
data$i <- as.numeric(as.factor(data$i))
}
summary(data)
}
然后打电话,
categoricalToNumeric(data, 'school', 'sex', 'address', 'famsize', 'Pstatus', 'Mjob', 'Fjob', 'reason', 'nursery', 'internet', 'guardian.x', 'schoolsup.x', 'famsup.x', 'paid.x', 'activities.x', 'higher.x', 'romantic.x', 'guardian.y', 'schoolsup.y', 'famsup.y', 'paid.y', 'activities.y', 'higher.y', 'romantic.y')
目前没有错误,但数据变量在 categoricalToNumeric
调用时没有发生变化。
数据:https://archive.ics.uci.edu/ml/machine-learning-databases/00320/student.zip
设置:
data_mat=read.table("./data/csv/student-mat.csv",sep=";",header=TRUE)
data_por=read.table("./data/csv/student-por.csv",sep=";",header=TRUE)
data=merge(data_mat,data_por,by=c("school","sex","age","address","famsize","Pstatus","Medu","Fedu","Mjob","Fjob","reason","nursery","internet"))
print(nrow(data)) # 382 data
head(data,5)
这很奇怪,但这很管用。为了方便起见,我将 ...
更改为 colnames
categoricalToNumeric2 <- function(data,...) {
for(i in colnames(data)) {
data[i] <- as.numeric(as.factor(data$i))
}
summary(data)
}
categoricalToNumeric2(data)
school sex age address famsize Pstatus Medu Fedu
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
Mjob Fjob reason nursery internet guardian.x traveltime.x studytime.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
failures.x schoolsup.x famsup.x paid.x activities.x higher.x romantic.x famrel.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
freetime.x goout.x Dalc.x Walc.x health.x absences.x G1.x G2.x
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
G3.x guardian.y traveltime.y studytime.y failures.y schoolsup.y famsup.y paid.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
activities.y higher.y romantic.y famrel.y freetime.y goout.y Dalc.y Walc.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
health.y absences.y G1.y G2.y G3.y
Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:2.000
Median :2.000 Median :2.000 Median :2.000 Median :2.000 Median :2.000
Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848 Mean :1.848
3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:2.000
Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000 Max. :2.000
data$i
不是在循环中提取列的有效方法。您可以对单列使用 [[
或对多列使用 [
。 for
循环的替代方法是使用 lapply
.
categoricalToNumeric <- function(data,...) {
cols <- c(...)
data[cols] <- lapply(data[cols], function(x) as.numeric(as.factor(x)))
summary(data)
}
categoricalToNumeric(data, 'school', 'sex', ...rest of the columns)