如何迭代更改变量值,直到所有预测概率均高于 .5
How to iteratively change variable value until all predicted probabilities are above .5
我正在尝试编写代码,从变量中减去给定值,直到每一行的预测概率等于或高于 .05。
train <- data.frame('cost'= c(120, 3, 2, 4, 10, 110, 200, 43, 1, 51, 22, 14),
'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44),
'dich' = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0))
train$dich <- as.factor(train$dich)
test <- data.frame('cost'= c(13, 5, 32, 22, 14, 145, 54, 134, 11, 14, 33, 21),
'price' = c(32, 11, 210, 6, 3, 7, 22, 423, 19, 99, 192, 32)
)
model <- glm(dich ~ cost + price,
data = train,
family = "binomial")
pred <- predict(model, test, type = "response")
1 2 3 4
3.001821e-01 4.442316e-01 4.507495e-04 6.310900e-01
5 6 7 8
5.995459e-01 9.888085e-01 7.114101e-01 1.606681e-06
9 10 11 12
4.096450e-01 2.590474e-02 9.908167e-04 3.572890e-01
所以在上面的输出中,案例 4、5、6 和 7 将保持不变,因为它们已经高于 .05,但对于其余案例,我想从价格列中减去 1,然后运行 再次预测并重复,直到所有情况的概率等于或高于 0.05。
我明白你在做什么,但结果非常搞笑。这是如果每次要对price的所有元素都减1:
x <- 1
while (TRUE) {
print("----------------------------------------")
print(x)
test$price <- test$price - 1
pred <- predict(model, test, type = "response")
print(pred)
x <- x + 1
if (sum(pred > 0.05) == length(pred)) {
print(test)
break
}
}
# ... loops 247 times
# [1] "----------------------------------------"
# [1] 248
# 1 2 3 4 5 6 7 8 9 10 11 12
# 0.99992994 0.99996240 0.93751936 0.99998243 0.99997993 0.99999966 0.99998781 0.05074762 0.99995669 0.99887117 0.97058913 0.99994594
# cost price
# 1 13 -216
# 2 5 -237
# 3 32 -38
# 4 22 -242
# 5 14 -245
# 6 145 -241
# 7 54 -226
# 8 134 175
# 9 11 -229
# 10 14 -149
# 11 33 -56
# 12 21 -216
如果您想为每一行(或 "customer")单独减去 1,而不是一刀切地减去 1:
test$pred_prob <- NA
for (n in 1:nrow(test)) {
print("-----------------------------")
print(n)
while (TRUE) {
pred <- predict(model, test[n,], type = "response")
print(pred)
test$pred_prob[n] <- pred
if (sum(pred > 0.05) == length(pred)) {
print(test$price[n])
break
}
test$price[n] <- test$price[n] - 1
}
print(test)
}
# cost price pred_prob
# 1 13 32 0.30018209
# 2 5 11 0.44423163
# 3 32 96 0.05128337
# 4 22 6 0.63109001
# 5 14 3 0.59954586
# 6 145 7 0.98880854
# 7 54 22 0.71141007
# 8 134 175 0.05074762
# 9 11 19 0.40964501
# 10 14 82 0.05149897
# 11 33 97 0.05081947
# 12 21 32 0.35728897
以防其他人想 运行 使用 xgboost 模型做同样的事情。
train <- data.frame('cost'= c(120, 3, 2, 4, 10, 110, 200, 43, 1, 51, 22, 14),
'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44))
label <- data.frame('dich' = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0))
train <- as.matrix(train)
label <- as.matrix(label)
model <- xgboost(data = train,
label = label,
max.depth = 3,
nround = 1,
objective = "binary:logistic")
test <- data.frame('cost'= c(13, 5, 32, 22, 14, 145, 54, 134, 11, 14, 33, 21),
'price' = c(32, 11, 210, 6, 3, 7, 22, 423, 19, 99, 192, 32)
)
test <- as.matrix(test)
#FOR A MATRIX
test <- cbind(test, rep(NA, nrow(test)))
colnames(test)[3] <- c("pred_prob")
for (n in 1:nrow(test)) {
print("-----------------------------")
print(n)
while (TRUE) {
pred <- predict(model, t(test[n,]), type = "response")
print(pred)
test[,"pred_prob"][n] <- pred
if (sum(pred > 0.5) == length(pred)) {
print(test[,"pred_prob"][n])
break
}
test[,"price"][n] <- test[,"price"][n] - .01
}
print(test)
}
12 行 运行 似乎需要一段时间。我需要思考树模型的阈值,以及这将如何影响价格的一系列不同变化以获得 0.5 或更高的概率(我在第一个问题中的意思是,但我写了 0.05 哈哈) .
我正在尝试编写代码,从变量中减去给定值,直到每一行的预测概率等于或高于 .05。
train <- data.frame('cost'= c(120, 3, 2, 4, 10, 110, 200, 43, 1, 51, 22, 14),
'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44),
'dich' = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0))
train$dich <- as.factor(train$dich)
test <- data.frame('cost'= c(13, 5, 32, 22, 14, 145, 54, 134, 11, 14, 33, 21),
'price' = c(32, 11, 210, 6, 3, 7, 22, 423, 19, 99, 192, 32)
)
model <- glm(dich ~ cost + price,
data = train,
family = "binomial")
pred <- predict(model, test, type = "response")
1 2 3 4
3.001821e-01 4.442316e-01 4.507495e-04 6.310900e-01
5 6 7 8
5.995459e-01 9.888085e-01 7.114101e-01 1.606681e-06
9 10 11 12
4.096450e-01 2.590474e-02 9.908167e-04 3.572890e-01
所以在上面的输出中,案例 4、5、6 和 7 将保持不变,因为它们已经高于 .05,但对于其余案例,我想从价格列中减去 1,然后运行 再次预测并重复,直到所有情况的概率等于或高于 0.05。
我明白你在做什么,但结果非常搞笑。这是如果每次要对price的所有元素都减1:
x <- 1
while (TRUE) {
print("----------------------------------------")
print(x)
test$price <- test$price - 1
pred <- predict(model, test, type = "response")
print(pred)
x <- x + 1
if (sum(pred > 0.05) == length(pred)) {
print(test)
break
}
}
# ... loops 247 times
# [1] "----------------------------------------"
# [1] 248
# 1 2 3 4 5 6 7 8 9 10 11 12
# 0.99992994 0.99996240 0.93751936 0.99998243 0.99997993 0.99999966 0.99998781 0.05074762 0.99995669 0.99887117 0.97058913 0.99994594
# cost price
# 1 13 -216
# 2 5 -237
# 3 32 -38
# 4 22 -242
# 5 14 -245
# 6 145 -241
# 7 54 -226
# 8 134 175
# 9 11 -229
# 10 14 -149
# 11 33 -56
# 12 21 -216
如果您想为每一行(或 "customer")单独减去 1,而不是一刀切地减去 1:
test$pred_prob <- NA
for (n in 1:nrow(test)) {
print("-----------------------------")
print(n)
while (TRUE) {
pred <- predict(model, test[n,], type = "response")
print(pred)
test$pred_prob[n] <- pred
if (sum(pred > 0.05) == length(pred)) {
print(test$price[n])
break
}
test$price[n] <- test$price[n] - 1
}
print(test)
}
# cost price pred_prob
# 1 13 32 0.30018209
# 2 5 11 0.44423163
# 3 32 96 0.05128337
# 4 22 6 0.63109001
# 5 14 3 0.59954586
# 6 145 7 0.98880854
# 7 54 22 0.71141007
# 8 134 175 0.05074762
# 9 11 19 0.40964501
# 10 14 82 0.05149897
# 11 33 97 0.05081947
# 12 21 32 0.35728897
以防其他人想 运行 使用 xgboost 模型做同样的事情。
train <- data.frame('cost'= c(120, 3, 2, 4, 10, 110, 200, 43, 1, 51, 22, 14),
'price' = c(120, 20, 10, 4, 3, 4, 30, 43, 56, 88, 75, 44))
label <- data.frame('dich' = c(0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0))
train <- as.matrix(train)
label <- as.matrix(label)
model <- xgboost(data = train,
label = label,
max.depth = 3,
nround = 1,
objective = "binary:logistic")
test <- data.frame('cost'= c(13, 5, 32, 22, 14, 145, 54, 134, 11, 14, 33, 21),
'price' = c(32, 11, 210, 6, 3, 7, 22, 423, 19, 99, 192, 32)
)
test <- as.matrix(test)
#FOR A MATRIX
test <- cbind(test, rep(NA, nrow(test)))
colnames(test)[3] <- c("pred_prob")
for (n in 1:nrow(test)) {
print("-----------------------------")
print(n)
while (TRUE) {
pred <- predict(model, t(test[n,]), type = "response")
print(pred)
test[,"pred_prob"][n] <- pred
if (sum(pred > 0.5) == length(pred)) {
print(test[,"pred_prob"][n])
break
}
test[,"price"][n] <- test[,"price"][n] - .01
}
print(test)
}
12 行 运行 似乎需要一段时间。我需要思考树模型的阈值,以及这将如何影响价格的一系列不同变化以获得 0.5 或更高的概率(我在第一个问题中的意思是,但我写了 0.05 哈哈) .