当一行重复时,标记它是来自一组、另一组还是两者?
When a row is duplicated, label if it came from one group, another, or both?
我有一个数据框,其中包含重复的 ID 和每个 ID 的组标签。我如何使用 tidyr
或相关工具从第一个数据框转到第二个数据框?我需要删除重复的行,并标记每行是否存在于组“a”、“b”或“两者”中。
library(tidyverse)
df <- tibble(id=c(1,2,2,3,4,4),
label=c("a","a","b","b","a","b"))
# A tibble: 6 x 2
id label
<dbl> <chr>
1 1 a
2 2 a
3 2 b
4 3 b
5 4 a
6 4 b
df_desired <- tibble(id=c(1,2,3,4),
label=c("a","both","b","both"))
# A tibble: 4 x 2
id label
<dbl> <chr>
1 1 a
2 2 both
3 3 b
4 4 both
这是一个可能的解决方案,使用 summarise
df %>%
group_by(id) %>%
summarise(label = if_else(length(unique(label)) == 2, "both", first(label)),
.groups = "drop")
# A tibble: 4 x 2
id label
<dbl> <chr>
1 1 a
2 2 both
3 3 b
4 4 both
dplyr
的另一种方式可以是:
library(tidyverse)
#Data
df <- tibble(id=c(1,2,2,3,4,4),
label=c("a","a","b","b","a","b"))
#Code
df %>% group_by(id) %>% summarise_all(toString) %>%
mutate(label=ifelse(nchar(label)==1,label,'both'))
输出:
# A tibble: 4 x 2
id label
<dbl> <chr>
1 1 a
2 2 both
3 3 b
4 4 both
我有一个数据框,其中包含重复的 ID 和每个 ID 的组标签。我如何使用 tidyr
或相关工具从第一个数据框转到第二个数据框?我需要删除重复的行,并标记每行是否存在于组“a”、“b”或“两者”中。
library(tidyverse)
df <- tibble(id=c(1,2,2,3,4,4),
label=c("a","a","b","b","a","b"))
# A tibble: 6 x 2
id label
<dbl> <chr>
1 1 a
2 2 a
3 2 b
4 3 b
5 4 a
6 4 b
df_desired <- tibble(id=c(1,2,3,4),
label=c("a","both","b","both"))
# A tibble: 4 x 2
id label
<dbl> <chr>
1 1 a
2 2 both
3 3 b
4 4 both
这是一个可能的解决方案,使用 summarise
df %>%
group_by(id) %>%
summarise(label = if_else(length(unique(label)) == 2, "both", first(label)),
.groups = "drop")
# A tibble: 4 x 2
id label
<dbl> <chr>
1 1 a
2 2 both
3 3 b
4 4 both
dplyr
的另一种方式可以是:
library(tidyverse)
#Data
df <- tibble(id=c(1,2,2,3,4,4),
label=c("a","a","b","b","a","b"))
#Code
df %>% group_by(id) %>% summarise_all(toString) %>%
mutate(label=ifelse(nchar(label)==1,label,'both'))
输出:
# A tibble: 4 x 2
id label
<dbl> <chr>
1 1 a
2 2 both
3 3 b
4 4 both