用不同的函数形式手动预测
manually predict with different functional forms
我有一个数据框,其中包含来自 glm 的系数(betas
下面)。数据框包含协变量标签、协变量形式和估计值。形式为线性 (Li)、squared/quadratic (Sq) 和对数 (Ps).
betas <- structure(list(CovGen = c("A", "B", "C", "D", "E", "F", "G",
"G", "H"), Form = c("Li", "Li", "Li", "Li", "Li", "Li", "Li",
"Sq", "Ps"), Estimate = c(0.0294573176934061, 0.0100315121169383,
-0.0155864186367343, -0.00871344935814372, 0.0362538988332902,
-0.0263072916746069, 0.0865742118052235, 0.0614689145750204,
0.00229745713752781)), .Names = c("CovGen", "Form", "Estimate"
), row.names = c(NA, 9L), class = "data.frame")
betas
CovGen Form Estimate
1 A Li 0.029457318
2 B Li 0.010031512
3 C Li -0.015586419
4 D Li -0.008713449
5 E Li 0.036253899
6 F Li -0.026307292
7 G Li 0.086574212
8 G Sq 0.061468915
9 H Ps 0.002297457
我尝试应用系数估计来手动预测新数据框的值(dat
包含在此处 dput
)。
dat <- structure(list(B = c(-1.47218074669544, -1.46929972689195, -1.46641870708846,
-1.46353768728497, -1.46065666748148, -1.45777564767799), C = c(-1.09847692593512,
-1.09375316152745, -1.08902939711978, -1.08430563271211, -1.07958186830444,
-1.07485810389677), D = c(-1.0109875688763, -1.00407851818141,
-0.997169467486518, -0.990260416791627, -0.983351366096736, -0.976442315401845
), E = c(-3.19632050296668, -3.19041566990116, -3.18451083683563,
-3.17860600377011, -3.17270117070458, -3.16679633763906), F = c(-2.81211918021003,
-2.80673925496675, -2.80135932972346, -2.79597940448018, -2.7905994792369,
-2.78521955399362), G = c(-2.32916817000267, -2.32368219245727,
-2.31819621491187, -2.31271023736647, -2.30722425982107, -2.30173828227567
), H = c(0.442067970883549, 0.417909464459238, 0.393750958034926,
0.369592451610615, 0.345433945186303, 0.321275438761992)), .Names = c("B",
"C", "D", "E", "F", "G", "H"), row.names = c(NA, 6L), class = "data.frame") "C", "D", "E", "F", "G", "H"), row.names = c(NA, 6L), class = "data.frame")
> dat
B C D E F G H
1 -1.472181 -1.098477 -1.0109876 -3.196321 -2.812119 -2.329168 0.4420680
2 -1.469300 -1.093753 -1.0040785 -3.190416 -2.806739 -2.323682 0.4179095
3 -1.466419 -1.089029 -0.9971695 -3.184511 -2.801359 -2.318196 0.3937510
4 -1.463538 -1.084306 -0.9902604 -3.178606 -2.795979 -2.312710 0.3695925
5 -1.460657 -1.079582 -0.9833514 -3.172701 -2.790599 -2.307224 0.3454339
6 -1.457776 -1.074858 -0.9764423 -3.166796 -2.785220 -2.301738 0.3212754
我正在尝试将 dat
df 中的新数据值乘以各自的 beta 并考虑函数形式。更具体地说,在此处包含的示例中,我想将 G beta 的 Sq 形式应用于 dat$G^2
,将 Ps H beta 应用于 log(dat$H)
。所有其他 beta 和值都可以简单地直接相乘,而无需考虑函数形式。请注意,A beta 未应用于 dat
df 中的新值。
我可能需要完成一项奖励 ifelse
声明,但我想知道是否还有其他想法 and/or 建议。
我在一个更大的循环中工作,每个协变量的形式都不一致。
所需的结果将是一个矩阵或 df,其中一列包含每个 beta 形式组合的预测值。例如,将有一个列包含所有 beta 的预测值,但 G 除外,G 和 G^2 的预测值。
提前致谢。
您可以尝试这样的解决方案
trans <- list(
Li=identity,
Sq=function(x) x^2,
Ps=function(x) log(x)
)
cpredict<-function(betas, datas) {
Map(function(var, fun, coef) {
trans[[fun]](datas[[var]])*coef
}, betas$CovGen, betas$Form, betas$Estimate)
}
cpredict(betas, dat)
但这不适用于您当前的数据,因为没有 dat$A
并且您不能取负数的对数。
我会尝试构建公式,然后使用 model.matrix
和矩阵乘法,如下所示:
betas$term = with(betas, ifelse(
Form == "Li", CovGen,
ifelse(Form == "Sq", sprintf("I(%s^2)", CovGen),
ifelse(Form == "Ps", sprintf("log(%s)", CovGen), NA)
)))
betas
# CovGen Form Estimate term
# 1 A Li 0.029457318 A
# 2 B Li 0.010031512 B
# 3 C Li -0.015586419 C
# 4 D Li -0.008713449 D
# 5 E Li 0.036253899 E
# 6 F Li -0.026307292 F
# 7 G Li 0.086574212 G
# 8 G Sq 0.061468915 I(G^2)
# 9 H Ps 0.002297457 log(H)
(my_formula = as.formula(paste("~", paste(betas$term, collapse = " + "))))
#~A + B + C + D + E + F + G + I(G^2) + log(H)
X = model.matrix(my_formula, data = dat)
prediction = X %*% betas$Estimate
正如 MrFlick 所说,这不适用于您当前的示例数据。
我有一个数据框,其中包含来自 glm 的系数(betas
下面)。数据框包含协变量标签、协变量形式和估计值。形式为线性 (Li)、squared/quadratic (Sq) 和对数 (Ps).
betas <- structure(list(CovGen = c("A", "B", "C", "D", "E", "F", "G",
"G", "H"), Form = c("Li", "Li", "Li", "Li", "Li", "Li", "Li",
"Sq", "Ps"), Estimate = c(0.0294573176934061, 0.0100315121169383,
-0.0155864186367343, -0.00871344935814372, 0.0362538988332902,
-0.0263072916746069, 0.0865742118052235, 0.0614689145750204,
0.00229745713752781)), .Names = c("CovGen", "Form", "Estimate"
), row.names = c(NA, 9L), class = "data.frame")
betas
CovGen Form Estimate
1 A Li 0.029457318
2 B Li 0.010031512
3 C Li -0.015586419
4 D Li -0.008713449
5 E Li 0.036253899
6 F Li -0.026307292
7 G Li 0.086574212
8 G Sq 0.061468915
9 H Ps 0.002297457
我尝试应用系数估计来手动预测新数据框的值(dat
包含在此处 dput
)。
dat <- structure(list(B = c(-1.47218074669544, -1.46929972689195, -1.46641870708846,
-1.46353768728497, -1.46065666748148, -1.45777564767799), C = c(-1.09847692593512,
-1.09375316152745, -1.08902939711978, -1.08430563271211, -1.07958186830444,
-1.07485810389677), D = c(-1.0109875688763, -1.00407851818141,
-0.997169467486518, -0.990260416791627, -0.983351366096736, -0.976442315401845
), E = c(-3.19632050296668, -3.19041566990116, -3.18451083683563,
-3.17860600377011, -3.17270117070458, -3.16679633763906), F = c(-2.81211918021003,
-2.80673925496675, -2.80135932972346, -2.79597940448018, -2.7905994792369,
-2.78521955399362), G = c(-2.32916817000267, -2.32368219245727,
-2.31819621491187, -2.31271023736647, -2.30722425982107, -2.30173828227567
), H = c(0.442067970883549, 0.417909464459238, 0.393750958034926,
0.369592451610615, 0.345433945186303, 0.321275438761992)), .Names = c("B",
"C", "D", "E", "F", "G", "H"), row.names = c(NA, 6L), class = "data.frame") "C", "D", "E", "F", "G", "H"), row.names = c(NA, 6L), class = "data.frame")
> dat
B C D E F G H
1 -1.472181 -1.098477 -1.0109876 -3.196321 -2.812119 -2.329168 0.4420680
2 -1.469300 -1.093753 -1.0040785 -3.190416 -2.806739 -2.323682 0.4179095
3 -1.466419 -1.089029 -0.9971695 -3.184511 -2.801359 -2.318196 0.3937510
4 -1.463538 -1.084306 -0.9902604 -3.178606 -2.795979 -2.312710 0.3695925
5 -1.460657 -1.079582 -0.9833514 -3.172701 -2.790599 -2.307224 0.3454339
6 -1.457776 -1.074858 -0.9764423 -3.166796 -2.785220 -2.301738 0.3212754
我正在尝试将 dat
df 中的新数据值乘以各自的 beta 并考虑函数形式。更具体地说,在此处包含的示例中,我想将 G beta 的 Sq 形式应用于 dat$G^2
,将 Ps H beta 应用于 log(dat$H)
。所有其他 beta 和值都可以简单地直接相乘,而无需考虑函数形式。请注意,A beta 未应用于 dat
df 中的新值。
我可能需要完成一项奖励 ifelse
声明,但我想知道是否还有其他想法 and/or 建议。
我在一个更大的循环中工作,每个协变量的形式都不一致。
所需的结果将是一个矩阵或 df,其中一列包含每个 beta 形式组合的预测值。例如,将有一个列包含所有 beta 的预测值,但 G 除外,G 和 G^2 的预测值。
提前致谢。
您可以尝试这样的解决方案
trans <- list(
Li=identity,
Sq=function(x) x^2,
Ps=function(x) log(x)
)
cpredict<-function(betas, datas) {
Map(function(var, fun, coef) {
trans[[fun]](datas[[var]])*coef
}, betas$CovGen, betas$Form, betas$Estimate)
}
cpredict(betas, dat)
但这不适用于您当前的数据,因为没有 dat$A
并且您不能取负数的对数。
我会尝试构建公式,然后使用 model.matrix
和矩阵乘法,如下所示:
betas$term = with(betas, ifelse(
Form == "Li", CovGen,
ifelse(Form == "Sq", sprintf("I(%s^2)", CovGen),
ifelse(Form == "Ps", sprintf("log(%s)", CovGen), NA)
)))
betas
# CovGen Form Estimate term
# 1 A Li 0.029457318 A
# 2 B Li 0.010031512 B
# 3 C Li -0.015586419 C
# 4 D Li -0.008713449 D
# 5 E Li 0.036253899 E
# 6 F Li -0.026307292 F
# 7 G Li 0.086574212 G
# 8 G Sq 0.061468915 I(G^2)
# 9 H Ps 0.002297457 log(H)
(my_formula = as.formula(paste("~", paste(betas$term, collapse = " + "))))
#~A + B + C + D + E + F + G + I(G^2) + log(H)
X = model.matrix(my_formula, data = dat)
prediction = X %*% betas$Estimate
正如 MrFlick 所说,这不适用于您当前的示例数据。