从 glm() 中删除完全分离的观察结果

Question

我正在使用 AER 包中的 HMDA 数据进行一些探索性数据分析；然而，我用来拟合模型的变量似乎包含一些可以完美确定结果的观察结果，这个问题被称为 "separation." 所以我尝试使用 this thread 推荐的解决方案来解决这个问题，但是当我尝试从glm.fit()执行第一组源代码，R返回错误信息：

Error in family$family : object of type 'closure' is not subsettable

所以我无法继续使用此代码从我的数据中删除那些完全确定的观察结果。我想知道是否有人可以帮我解决这个问题？

下面提供了我当前的代码供您参考。

# load the AER package and HMDA data
library(AER)
data(HMDA)

# fit a 2-degree olynomial probit model 
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA)

# using the revised source code from that stackexchage thread to find out observations that received a warning message
library(tidyverse)
library(dplyr)
library(broom)

eps <- 10 * .Machine$double.eps
if (family$family == "binomial") {
  if (any(mu > 1 - eps) || any(mu < eps)) 
    warning("glm.fit: fitted probabilities numerically 0 or 1 occurred", 
            call. = FALSE)
}

# this return the following error message
# Error in family$family : object of type 'closure' is not subsettable


probit.resids <- augment(probit.fit) %>%
  mutate(p = 1 / (1 + exp(-.fitted)),
         warning = p > 1-eps)

arrange(probit.resids, desc(.fitted)) %>%  
  select(2:5, p, warning) %>% 
  slice(1:10)


HMDA.nwarning <- filter(HMDA, !probit.resids$warning)

# using HMDA.nwarning should solve the problem...
probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial, data = HMDA.nwarning)

Answer 1

这段代码

if (family$family == "binomial") {
  if (any(mu > 1 - eps) || any(mu < eps)) 
    warning("glm.fit: fitted probabilities numerically 0 or 1 occurred", 
            call. = FALSE)
}

有一个函数，binomial() 当你运行 glm with family == "binomial" 时调用。如果您查看 glm（只需键入 glm）：

if (is.character(family)) 
        family <- get(family, mode = "function", envir = parent.frame())
    if (is.function(family)) 
        family <- family()
    if (is.null(family$family)) {
        print(family)
        stop("'family' not recognized")
    }

并且 glm 函数在拟合期间检查 binomial()$family，如果任何预测值与 1 或 0 相差 eps，它会发出警告。

你不需要运行那部分，是的，你需要设置 eps <- 10 * .Machine$double.eps 。所以让我们运行下面的代码，如果你运行一个probit，你需要在二项式中指定link="probit"，否则默认是logit:

library(AER)
library(tidyverse)
library(dplyr)
library(broom)

data(HMDA)

probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA)

eps <- 10 * .Machine$double.eps

probit.resids <- augment(probit.fit) %>%
  mutate(p = 1 / (1 + exp(-.fitted)),
         warning = p > 1-eps)

warning 列表示观察结果是否引发警告，在此数据集中，有一个：

table(probit.resids$warning)

FALSE  TRUE 
 2379     1

我们可以使用下一步过滤它

HMDA.nwarning <- filter(HMDA, !probit.resids$warning)
dim(HMDA.nwarning)
[1] 2379   14

然后重新运行回归：

probit.fit <- glm(deny ~ poly(hirat, 2), family = binomial(link="probit"), data = HMDA.nwarning)
coefficients(probit.fit)
(Intercept) poly(hirat, 2)1 poly(hirat, 2)2 
      -1.191292        8.708494        6.884404

从 glm() 中删除完全分离的观察结果

Removing completely separated observations from glm()

r

subset

glm

logistic-regression