我如何命令 ifelse() 创建一个与其他 3 个变量交互的虚拟变量?
How do I command ifelse() to create a dummy variable that interacts with 3 other variables?
我正在尝试解决一个问题。我想问的问题 ifelse() 可以翻译为“我们调查州的个人是否在过去两年内见证了事件?”
为了清楚起见,我做了一个例子table。
#Creating example table
tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))
我想使用 mutate() 创建一个名为 "two_years" 并且满足条件的新变量ifelse() 其中:
如果来自 [state_code] 的个人目击了 [事件] [year], [year-1], [year-2], 虚拟变量等于 1。因此,如果[事件]的总和>0,满足条件
观察单位为个人。在我自己的数据库中,这些年和州内有多个人。但是,在这种情况下,每个独特的个体对于 ifelse() 条件都无关紧要。它应该适用于 [state_code] 内在 [年].
期间接受调查的所有个人
我想这就是您要找的。它应该概括为您描述的每年有多个人的情况。如果我误解了问题,请道歉。
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))
tab %>%
group_by(state_code, year) %>%
summarise(event = sum(event),
.groups = "drop_last") %>%
mutate(dummy = ifelse(event == 1 |
lag(event, 1L, order_by = year, default = 0) == 1 |
lag(event, 2L, order_by = year, default = 0) == 1,
1,
0)
) %>%
ungroup() %>%
select(-event) %>%
right_join(tab,
by = c("state_code", "year"))
#> # A tibble: 10 × 5
#> state_code year dummy individual event
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 1200 1992 0 Person 1 0
#> 2 1200 1993 0 Person 2 0
#> 3 1200 1994 1 Person 3 1
#> 4 1200 1995 1 Person 4 0
#> 5 1200 1996 1 Person 5 0
#> 6 1201 1992 0 Person 6 0
#> 7 1201 1993 0 Person 7 0
#> 8 1201 1994 0 Person 8 0
#> 9 1201 1995 0 Person 9 0
#> 10 1201 1995 0 Person 10 0
由 reprex package (v2.0.1)
于 2022 年 3 月 15 日创建
这是一种方法。前两年并没有真正定义,但如果您希望它们为零,只需将 shift(event,0:2)
更改为 shift(event,0:2, fill=0)
inner_join(
tab,
tab %>%
group_by(state_code,year) %>%
summarize(event=sum(event,na.rm=T)) %>%
mutate(two_years = as.integer(Reduce(`+`,data.table::shift(event,0:2))>0)) %>%
select(!event),
by=c("state_code","year")
)
state_code individual event year two_years
1 1200 Person 1 0 1992 NA
2 1200 Person 2 0 1993 NA
3 1200 Person 3 1 1994 1
4 1200 Person 4 0 1995 1
5 1200 Person 5 0 1996 1
6 1201 Person 6 0 1992 NA
7 1201 Person 7 0 1993 NA
8 1201 Person 8 0 1994 0
9 1201 Person 9 0 1995 0
10 1201 Person 10 0 1995 0
我正在尝试解决一个问题。我想问的问题 ifelse() 可以翻译为“我们调查州的个人是否在过去两年内见证了事件?”
为了清楚起见,我做了一个例子table。
#Creating example table
tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))
我想使用 mutate() 创建一个名为 "two_years" 并且满足条件的新变量ifelse() 其中:
如果来自 [state_code] 的个人目击了 [事件] [year], [year-1], [year-2], 虚拟变量等于 1。因此,如果[事件]的总和>0,满足条件
观察单位为个人。在我自己的数据库中,这些年和州内有多个人。但是,在这种情况下,每个独特的个体对于 ifelse() 条件都无关紧要。它应该适用于 [state_code] 内在 [年].
期间接受调查的所有个人
我想这就是您要找的。它应该概括为您描述的每年有多个人的情况。如果我误解了问题,请道歉。
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
tab <- data.frame(state_code = c('1200', '1200', '1200', '1200', '1200', '1201', '1201', '1201', '1201', '1201'),
individual = c('Person 1', 'Person 2', 'Person 3', 'Person 4', 'Person 5', 'Person 6', 'Person 7', 'Person 8', 'Person 9', 'Person 10'),
event = c(0, 0, 1, 0, 0, 0, 0, 0, 0, 0),
year = c(1992, 1993, 1994, 1995, 1996, 1992, 1993, 1994, 1995, 1995))
tab %>%
group_by(state_code, year) %>%
summarise(event = sum(event),
.groups = "drop_last") %>%
mutate(dummy = ifelse(event == 1 |
lag(event, 1L, order_by = year, default = 0) == 1 |
lag(event, 2L, order_by = year, default = 0) == 1,
1,
0)
) %>%
ungroup() %>%
select(-event) %>%
right_join(tab,
by = c("state_code", "year"))
#> # A tibble: 10 × 5
#> state_code year dummy individual event
#> <chr> <dbl> <dbl> <chr> <dbl>
#> 1 1200 1992 0 Person 1 0
#> 2 1200 1993 0 Person 2 0
#> 3 1200 1994 1 Person 3 1
#> 4 1200 1995 1 Person 4 0
#> 5 1200 1996 1 Person 5 0
#> 6 1201 1992 0 Person 6 0
#> 7 1201 1993 0 Person 7 0
#> 8 1201 1994 0 Person 8 0
#> 9 1201 1995 0 Person 9 0
#> 10 1201 1995 0 Person 10 0
由 reprex package (v2.0.1)
于 2022 年 3 月 15 日创建这是一种方法。前两年并没有真正定义,但如果您希望它们为零,只需将 shift(event,0:2)
更改为 shift(event,0:2, fill=0)
inner_join(
tab,
tab %>%
group_by(state_code,year) %>%
summarize(event=sum(event,na.rm=T)) %>%
mutate(two_years = as.integer(Reduce(`+`,data.table::shift(event,0:2))>0)) %>%
select(!event),
by=c("state_code","year")
)
state_code individual event year two_years
1 1200 Person 1 0 1992 NA
2 1200 Person 2 0 1993 NA
3 1200 Person 3 1 1994 1
4 1200 Person 4 0 1995 1
5 1200 Person 5 0 1996 1
6 1201 Person 6 0 1992 NA
7 1201 Person 7 0 1993 NA
8 1201 Person 8 0 1994 0
9 1201 Person 9 0 1995 0
10 1201 Person 10 0 1995 0