我如何命令 ifelse() 创建一个与其他 3 个变量交互的虚拟变量?

How do I command ifelse() to create a dummy variable that interacts with 3 other variables?

我正在尝试解决一个问题。我想问的问题 ifelse() 可以翻译为“我们调查州的个人是否在过去两年内见证了事件?”

为了清楚起见,我做了一个例子table。

#Creating example table
tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
                  individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
                  event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
                  year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))

我想使用 mutate() 创建一个名为 "two_years" 并且满足条件的新变量ifelse() 其中:

  1. 如果来自 [state_code] 的个人目击了 [事件] [year], [year-1], [year-2], 虚拟变量等于 1。因此,如果[事件]的总和>0,满足条件

  2. 观察单位为个人。在我自己的数据库中,这些年和州内有多个人。但是,在这种情况下,每个独特的个体对于 ifelse() 条件都无关紧要。它应该适用于 [state_code] 内在 [年].

    期间接受调查的所有个人

我想这就是您要找的。它应该概括为您描述的每年有多个人的情况。如果我误解了问题,请道歉。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
                  individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
                  event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
                  year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))

tab %>%
  group_by(state_code, year) %>%
  summarise(event = sum(event),
            .groups = "drop_last") %>%
  mutate(dummy = ifelse(event == 1 |
                          lag(event, 1L, order_by = year, default = 0) == 1 |
                          lag(event, 2L, order_by = year, default = 0) == 1,
                        1,
                        0)
  ) %>%
  ungroup() %>%
  select(-event) %>%
  right_join(tab, 
             by = c("state_code", "year"))
#> # A tibble: 10 × 5
#>    state_code  year dummy individual event
#>    <chr>      <dbl> <dbl> <chr>      <dbl>
#>  1 1200        1992     0 Person 1       0
#>  2 1200        1993     0 Person 2       0
#>  3 1200        1994     1 Person 3       1
#>  4 1200        1995     1 Person 4       0
#>  5 1200        1996     1 Person 5       0
#>  6 1201        1992     0 Person 6       0
#>  7 1201        1993     0 Person 7       0
#>  8 1201        1994     0 Person 8       0
#>  9 1201        1995     0 Person 9       0
#> 10 1201        1995     0 Person 10      0

reprex package (v2.0.1)

于 2022 年 3 月 15 日创建

这是一种方法。前两年并没有真正定义,但如果您希望它们为零,只需将 shift(event,0:2) 更改为 shift(event,0:2, fill=0)

inner_join(
  tab,
  tab %>%
    group_by(state_code,year) %>%
    summarize(event=sum(event,na.rm=T)) %>% 
    mutate(two_years = as.integer(Reduce(`+`,data.table::shift(event,0:2))>0)) %>% 
    select(!event),
  by=c("state_code","year")
)

   state_code individual event year two_years
1        1200   Person 1     0 1992        NA
2        1200   Person 2     0 1993        NA
3        1200   Person 3     1 1994         1
4        1200   Person 4     0 1995         1
5        1200   Person 5     0 1996         1
6        1201   Person 6     0 1992        NA
7        1201   Person 7     0 1993        NA
8        1201   Person 8     0 1994         0
9        1201   Person 9     0 1995         0
10       1201  Person 10     0 1995         0