根据组和年份创建 pre 和 post 标志
Create pre and post flags based on group and year
我正在尝试标记观察前和 post- 我们为每个公司记录的标记。
下面是虚拟数据,我很难弄明白,但我想这里有一个优雅的解决方案,可以将 employee
分组并确定旗帜的年份。
employee <- c('a','a','a','a','b','b','b','b','b','c', 'c', 'c', 'c')
year <- c('2001','2002','2003','2004','2001','2002','2003','2004','2005','2001','2002','2003','2004')
flag <- c('NA','NA','1','NA','NA','1','NA','NA','NA','1','NA','NA','NA')
start <- data.frame(employee, year, flag)
employee year flag
1 a 2001 NA
2 a 2002 NA
3 a 2003 1
4 a 2004 NA
5 b 2001 NA
6 b 2002 1
7 b 2003 NA
8 b 2004 NA
9 b 2005 NA
10 c 2001 1
11 c 2002 NA
12 c 2003 NA
13 c 2004 NA
prepo <- c('pre','pre','po','po','pre','po','po','po','po','po','po','po','po')
end <- data.frame(employee, year, flag, prepo)
employee year flag prepo
1 a 2001 NA pre
2 a 2002 NA pre
3 a 2003 1 po
4 a 2004 NA po
5 b 2001 NA pre
6 b 2002 1 po
7 b 2003 NA po
8 b 2004 NA po
9 b 2005 NA po
10 c 2001 1 po
11 c 2002 NA po
12 c 2003 NA po
13 c 2004 NA po
我们将字符串 "NA" 转换为 NA
(na_if
),按 'employee' 分组,根据第一个 NA 的出现创建条件 case_when
在 'flag' 中将值更改为 'pre' 和 'po'
library(dplyr)
start %>%
mutate(flag = na_if(flag, 'NA')) %>%
group_by(employee) %>%
mutate(prepo = case_when(row_number() < which(!is.na(flag))[1]
~ 'pre', TRUE ~ 'po')) %>%
ungroup
-输出
# A tibble: 13 x 4
# employee year flag prepo
# <chr> <chr> <chr> <chr>
# 1 a 2001 <NA> pre
# 2 a 2002 <NA> pre
# 3 a 2003 1 po
# 4 a 2004 <NA> po
# 5 b 2001 <NA> pre
# 6 b 2002 1 po
# 7 b 2003 <NA> po
# 8 b 2004 <NA> po
# 9 b 2005 <NA> po
#10 c 2001 1 po
#11 c 2002 <NA> po
#12 c 2003 <NA> po
#13 c 2004 <NA> po
或者另一种选择是使用 cumsum
创建索引并根据索引
替换值
start %>%
arrange(employee, year) %>%
group_by(employee) %>%
mutate(prepo = c('pre', 'po')[cumsum(replace(flag,
flag == "NA", 0))+1]) %>%
ungroup
我不确定这是否适用于您的一般情况
> setDT(start)[, prepo := c("pre", "po")[cumsum(flag == "1") + 1], employee][]
employee year flag prepo
1: a 2001 NA pre
2: a 2002 NA pre
3: a 2003 1 po
4: a 2004 NA po
5: b 2001 NA pre
6: b 2002 1 po
7: b 2003 NA po
8: b 2004 NA po
9: b 2005 NA po
10: c 2001 1 po
11: c 2002 NA po
12: c 2003 NA po
13: c 2004 NA po
我正在尝试标记观察前和 post- 我们为每个公司记录的标记。
下面是虚拟数据,我很难弄明白,但我想这里有一个优雅的解决方案,可以将 employee
分组并确定旗帜的年份。
employee <- c('a','a','a','a','b','b','b','b','b','c', 'c', 'c', 'c')
year <- c('2001','2002','2003','2004','2001','2002','2003','2004','2005','2001','2002','2003','2004')
flag <- c('NA','NA','1','NA','NA','1','NA','NA','NA','1','NA','NA','NA')
start <- data.frame(employee, year, flag)
employee year flag
1 a 2001 NA
2 a 2002 NA
3 a 2003 1
4 a 2004 NA
5 b 2001 NA
6 b 2002 1
7 b 2003 NA
8 b 2004 NA
9 b 2005 NA
10 c 2001 1
11 c 2002 NA
12 c 2003 NA
13 c 2004 NA
prepo <- c('pre','pre','po','po','pre','po','po','po','po','po','po','po','po')
end <- data.frame(employee, year, flag, prepo)
employee year flag prepo
1 a 2001 NA pre
2 a 2002 NA pre
3 a 2003 1 po
4 a 2004 NA po
5 b 2001 NA pre
6 b 2002 1 po
7 b 2003 NA po
8 b 2004 NA po
9 b 2005 NA po
10 c 2001 1 po
11 c 2002 NA po
12 c 2003 NA po
13 c 2004 NA po
我们将字符串 "NA" 转换为 NA
(na_if
),按 'employee' 分组,根据第一个 NA 的出现创建条件 case_when
在 'flag' 中将值更改为 'pre' 和 'po'
library(dplyr)
start %>%
mutate(flag = na_if(flag, 'NA')) %>%
group_by(employee) %>%
mutate(prepo = case_when(row_number() < which(!is.na(flag))[1]
~ 'pre', TRUE ~ 'po')) %>%
ungroup
-输出
# A tibble: 13 x 4
# employee year flag prepo
# <chr> <chr> <chr> <chr>
# 1 a 2001 <NA> pre
# 2 a 2002 <NA> pre
# 3 a 2003 1 po
# 4 a 2004 <NA> po
# 5 b 2001 <NA> pre
# 6 b 2002 1 po
# 7 b 2003 <NA> po
# 8 b 2004 <NA> po
# 9 b 2005 <NA> po
#10 c 2001 1 po
#11 c 2002 <NA> po
#12 c 2003 <NA> po
#13 c 2004 <NA> po
或者另一种选择是使用 cumsum
创建索引并根据索引
start %>%
arrange(employee, year) %>%
group_by(employee) %>%
mutate(prepo = c('pre', 'po')[cumsum(replace(flag,
flag == "NA", 0))+1]) %>%
ungroup
我不确定这是否适用于您的一般情况
> setDT(start)[, prepo := c("pre", "po")[cumsum(flag == "1") + 1], employee][]
employee year flag prepo
1: a 2001 NA pre
2: a 2002 NA pre
3: a 2003 1 po
4: a 2004 NA po
5: b 2001 NA pre
6: b 2002 1 po
7: b 2003 NA po
8: b 2004 NA po
9: b 2005 NA po
10: c 2001 1 po
11: c 2002 NA po
12: c 2003 NA po
13: c 2004 NA po