在具有多个逻辑条件的 r 中使用 unique
Use unique in r with more than one logical condition
data.table
中的以下数据框
df <- data.table (id=c(1,1,2,2,3,3,4,4),
date=c("2013-11-22","2017-01-24","2017-06-24","2020-02-10","2011-01-03","2013-11-24","2015-01-24","2017-08-24"),
status=c("Former","Current","Former","Never","Current",NA,"Current","Former"))
df
id date status
1: 1 2013-11-22 Former
2: 1 2017-01-24 Current
3: 2 2017-06-24 Former
4: 2 2020-02-10 Never
5: 3 2011-01-03 Current
6: 3 2013-11-24 <NA>
7: 4 2015-01-24 Current
8: 4 2017-08-24 Former
我想使用以下逻辑为每个 id 创建一个唯一的行。应保留最新的 date
。如果最近日期的 status
是 <NA>
或 Never
并且还有一个更早日期的 status
,则应保留更早日期的行。
我用以下函数解决了这个问题:
unique1 <- df[df$status %in% c("Former","Current"),]
unique1 <- unique1[,.SD[which.max(anydate(date))],by=.(id)]
unique_final <- unique(df[order(id,ordered(status,c("Former","Current","Never",NA)))],by='id')
unique_final[match(unique1$id,unique_final$id),]<-unique1
并得到这些结果
id date status
1: 1 2017-01-24 Current
2: 2 2017-06-24 Former
3: 3 2011-01-03 Current
4: 4 2017-08-24 Former
有没有办法结合这两个逻辑子集步骤?我想避免创建新的数据框而不是匹配它们。
我正在与 data.table
合作,一个更大数据集的解决方案会很棒。
谢谢!
可以试试:
library(data.table)
df[, .SD[
if (all(status %in% c(NA, 'Never'))) .N
else max(which(!status %in% c(NA, 'Never')))
], by = id]
输出:
id date status
1: 1 2017-01-24 Current
2: 2 2017-06-24 Former
3: 3 2011-01-03 Current
4: 4 2017-08-24 Former
这是一个基于 dplyr
的解决方案。它重新编码状态,使当前和以前具有相同的级别,然后排序并为每个 id
取第一行
library(dplyr)
library(data.table)
df <- data.table(id=c(1,1,2,2,3,3,4,4),
date=c("2013-11-22","2017-01-24","2017-06-24","2020-02-10","2011-01-03","2013-11-24","2015-01-24","2017-08-24"),
status=c("Former","Current","Former","Never","Current",NA,"Current","Former"))
df %>%
mutate(
status = factor(status, levels = c("Never", "Former", "Current")),
status2 = forcats::fct_recode(status, "Current" = "Former")
) %>%
group_by(id) %>%
arrange(desc(status2), desc(date)) %>%
select(-status2) %>%
slice(1)
#> # A tibble: 4 x 3
#> # Groups: id [4]
#> id date status
#> <dbl> <chr> <fct>
#> 1 1 2017-01-24 Current
#> 2 2 2017-06-24 Former
#> 3 3 2011-01-03 Current
#> 4 4 2017-08-24 Former
由 reprex package (v0.3.0)
于 2020-08-29 创建
这是使用 subset
+ ave
的基础 R 选项
subset(
df[!df$status %in% c(NA, "Never"), ],
as.logical(ave(date, id, FUN = function(x) x == max(x)))
)
data.table
df <- data.table (id=c(1,1,2,2,3,3,4,4),
date=c("2013-11-22","2017-01-24","2017-06-24","2020-02-10","2011-01-03","2013-11-24","2015-01-24","2017-08-24"),
status=c("Former","Current","Former","Never","Current",NA,"Current","Former"))
df
id date status
1: 1 2013-11-22 Former
2: 1 2017-01-24 Current
3: 2 2017-06-24 Former
4: 2 2020-02-10 Never
5: 3 2011-01-03 Current
6: 3 2013-11-24 <NA>
7: 4 2015-01-24 Current
8: 4 2017-08-24 Former
我想使用以下逻辑为每个 id 创建一个唯一的行。应保留最新的 date
。如果最近日期的 status
是 <NA>
或 Never
并且还有一个更早日期的 status
,则应保留更早日期的行。
我用以下函数解决了这个问题:
unique1 <- df[df$status %in% c("Former","Current"),]
unique1 <- unique1[,.SD[which.max(anydate(date))],by=.(id)]
unique_final <- unique(df[order(id,ordered(status,c("Former","Current","Never",NA)))],by='id')
unique_final[match(unique1$id,unique_final$id),]<-unique1
并得到这些结果
id date status
1: 1 2017-01-24 Current
2: 2 2017-06-24 Former
3: 3 2011-01-03 Current
4: 4 2017-08-24 Former
有没有办法结合这两个逻辑子集步骤?我想避免创建新的数据框而不是匹配它们。
我正在与 data.table
合作,一个更大数据集的解决方案会很棒。
谢谢!
可以试试:
library(data.table)
df[, .SD[
if (all(status %in% c(NA, 'Never'))) .N
else max(which(!status %in% c(NA, 'Never')))
], by = id]
输出:
id date status
1: 1 2017-01-24 Current
2: 2 2017-06-24 Former
3: 3 2011-01-03 Current
4: 4 2017-08-24 Former
这是一个基于 dplyr
的解决方案。它重新编码状态,使当前和以前具有相同的级别,然后排序并为每个 id
library(dplyr)
library(data.table)
df <- data.table(id=c(1,1,2,2,3,3,4,4),
date=c("2013-11-22","2017-01-24","2017-06-24","2020-02-10","2011-01-03","2013-11-24","2015-01-24","2017-08-24"),
status=c("Former","Current","Former","Never","Current",NA,"Current","Former"))
df %>%
mutate(
status = factor(status, levels = c("Never", "Former", "Current")),
status2 = forcats::fct_recode(status, "Current" = "Former")
) %>%
group_by(id) %>%
arrange(desc(status2), desc(date)) %>%
select(-status2) %>%
slice(1)
#> # A tibble: 4 x 3
#> # Groups: id [4]
#> id date status
#> <dbl> <chr> <fct>
#> 1 1 2017-01-24 Current
#> 2 2 2017-06-24 Former
#> 3 3 2011-01-03 Current
#> 4 4 2017-08-24 Former
由 reprex package (v0.3.0)
于 2020-08-29 创建这是使用 subset
+ ave
subset(
df[!df$status %in% c(NA, "Never"), ],
as.logical(ave(date, id, FUN = function(x) x == max(x)))
)