r中逻辑回归中的分类变量

Question

我如何在 R 中的二元逻辑回归中实现分类变量？我想测试专业领域（学生、工人、教师、个体经营者）对产品购买概率的影响。

在我的示例中，y 是一个二进制变量（1 表示购买产品，0 表示不购买）。
- x1: 是性别（0 男，1 女）
- x2: 是年龄（20 到 80 岁之间）
- x3：是分类变量（1=学生，2=工人，3=老师，4=个体户）

set.seed(123)
y<-round(runif(100,0,1))
x1<-round(runif(100,0,1))
x2<-round(runif(100,20,80))
x3<-round(runif(100,1,4))
test<-glm(y~x1+x2+x3, family=binomial(link="logit"))
summary(test)

如果我在上面的回归中实现 x3（专业领域），我会得到 x3 的错误 estimates/interpretation。

我必须做些什么才能获得分类变量 (x3) 的正确 influence/estimates？

非常感谢

Answer 1

我建议你将x3设置为因子变量，不需要创建虚拟变量：

set.seed(123)
y <- round(runif(100,0,1))
x1 <- round(runif(100,0,1))
x2 <- round(runif(100,20,80))
x3 <- factor(round(runif(100,1,4)),labels=c("student", "worker", "teacher", "self-employed"))

test <- glm(y~x1+x2+x3, family=binomial(link="logit"))
summary(test)

Here is the summary:

这是您的模型的输出：

Call:
glm(formula = y ~ x1 + x2 + x3, family = binomial(link = "logit"))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.4665  -1.1054  -0.9639   1.1979   1.4044  

Coefficients:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)      0.464751   0.806463   0.576    0.564
x1               0.298692   0.413875   0.722    0.470
x2              -0.002454   0.011875  -0.207    0.836
x3worker        -0.807325   0.626663  -1.288    0.198
x3teacher       -0.567798   0.615866  -0.922    0.357
x3self-employed -0.715193   0.756699  -0.945    0.345

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.47  on 99  degrees of freedom
Residual deviance: 135.98  on 94  degrees of freedom
AIC: 147.98

Number of Fisher Scoring iterations: 4

无论如何，我建议你在 R-bloggers 上研究这个post： https://www.r-bloggers.com/logistic-regression-and-categorical-covariates/

r中逻辑回归中的分类变量

categorical variable in logistic regression in r

r

categorical-data

logistic-regression