如何使用 Tidyverse 根据另一列的值删除 Select 个重复值

How to Use Tidyverse to Remove Select Duplicate Values Based on the Value of Another Column

library(tidyverse)

使用底部的示例数据,我尝试删除 ID 列中的重复项,但仅删除 "Year" 列等于 2017 的重复项。

我尝试了下面的代码,它似乎不起作用。

DF <- DF %>% 
  group_by(ID) %>% 
  mutate(REMOVE = if_else(duplicated(ID) & Year == 2017, 1, 0))

DF <- DF %>% 
  group_by(ID) %>% 
  mutate(REMOVE = if_else(!unique(ID) & Year == 2017, 1, 0))

我正在尝试使用代码按 "ID" 进行分组,然后使用 "if_else" 语句在每组重复 ID 中使用 1 对 2017 年进行编码。然后我将删除所有 1 都带有下面的过滤器代码。

DF <- DF %>%
  filter(REMOVE == 1)

我不确定为什么这段代码不起作用。我也尝试过将 ID 和 Year 的列类型从字符、数字等更改,但这没有帮助。

不胜感激!

ID<-c(18998878,8888888,57485746,18998878,45454536,64536475,64536475,87966666,58675844,58695847,68574443,87966666)
Program<-c("A111","B488","T687","A111","G888","T444","T444","P867","R444","B323","F888","P867")
Code<-c(1222,4534,543,1222,4678,6544,6544,9898,8888,5656,6666,9898)
Year<-c(2016,2016,2017,2017,2017,2017,2016,2016,2016,2017,2017,2017)
DF<-data_frame(ID,Program,Code,Year)
ID<-c(18998878,8888888,57485746,18998878,45454536,64536475,64536475,87966666,58675844,58695847,68574443,87966666)
Program<-c("A111","B488","T687","A111","G888","T444","T444","P867","R444","B323","F888","P867")
Code<-c(1222,4534,543,1222,4678,6544,6544,9898,8888,5656,6666,9898)
Year<-c(2016,2016,2017,2017,2017,2017,2016,2016,2016,2017,2017,2017)
DF<-data_frame(ID,Program,Code,Year)

filter(DF, (! duplicated(ID)) & Year == 2017)

如果年份是 2017 年,这将删除第二次或以后出现的任何 ID。请注意,没有这样的例子,所以我可能误解了你的问题。

你把它分成两个数据框,一个年份等于 2017,一个年份不等于 2017。

 DF1 <- DF %>% filter(Year==2017) 
 DF2 <- DF %>% filter(Year!=2017)

然后使用 distinct() 通过其 ID 列对 DF1 进行去重。 Keep_all 是保留其余值。

 DF3 <- DF1 %>% distinct(ID,.keep_all = T)

现在您可以通过将 DF2 和 DF3 与 rbind() 组合来获得最终结果

 df_all <- rbind(DF2,DF3)

IDYearDF 进行排序,然后使用 distinct 仅保留 Year = 2016 个值

library(dplyr)

ID <- c(18998878,8888888,57485746,18998878,45454536,64536475,64536475,87966666,
        58675844,58695847,68574443,87966666)
Program <- c("A111","B488","T687","A111","G888","T444","T444","P867","R444","B323","F888","P867")
Code <- c(1222,4534,543,1222,4678,6544,6544,9898,8888,5656,6666,9898)
Year <- c(2016,2016,2017,2017,2017,2017,2016,2016,2016,2017,2017,2017)
DF <- data_frame(ID,Program,Code,Year)
DF
#> # A tibble: 12 x 4
#>           ID Program  Code  Year
#>        <dbl> <chr>   <dbl> <dbl>
#>  1 18998878. A111    1222. 2016.
#>  2  8888888. B488    4534. 2016.
#>  3 57485746. T687     543. 2017.
#>  4 18998878. A111    1222. 2017.
#>  5 45454536. G888    4678. 2017.
#>  6 64536475. T444    6544. 2017.
#>  7 64536475. T444    6544. 2016.
#>  8 87966666. P867    9898. 2016.
#>  9 58675844. R444    8888. 2016.
#> 10 58695847. B323    5656. 2017.
#> 11 68574443. F888    6666. 2017.
#> 12 87966666. P867    9898. 2017.


DF %>% 
  arrange(ID, Year) %>% 
  distinct(ID, .keep_all = TRUE)
#> # A tibble: 9 x 4
#>          ID Program  Code  Year
#>       <dbl> <chr>   <dbl> <dbl>
#> 1  8888888. B488    4534. 2016.
#> 2 18998878. A111    1222. 2016.
#> 3 45454536. G888    4678. 2017.
#> 4 57485746. T687     543. 2017.
#> 5 58675844. R444    8888. 2016.
#> 6 58695847. B323    5656. 2017.
#> 7 64536475. T444    6544. 2016.
#> 8 68574443. F888    6666. 2017.
#> 9 87966666. P867    9898. 2016.

reprex package (v0.2.0) 创建于 2018-03-07。