根据两列中的信息向我的数据框添加一列

Question

我想根据数据框两列中的信息向我的数据框添加一列。

在我的示例数据框中，一个样本有两行条目，分别是第 3 行和第 4 行。我想编写一个新列“main”并在每一行中填写“1”，其中有一个唯一的标签号。对于具有重复标签编号的行，我需要权重最高的行在 main 中为“1”，而所有其他行都用“0”填充。

df
       sp    weight   tag
1   green        70     1
2  yellow        63     2
3     red        41     3
4     red        25     3
5     red         9     3

df with "main" column added
       sp    weight   tag  main
1   green        70     1     1
2  yellow        63     2     1
3     red        41     3     1
4     red        25     3     0
5     red         9     3     0

这是我目前的情况：

df$is.uniq <- duplicated(df$tag) | duplicated(df$tag), fromLast = TRUE) 
df$main <- ifelse(is.uniq==TRUE, "1", ifelse(is.uniq==FALSE, "0", NA  ))

我知道我需要更改第二个 ifelse 语句以引用重量列并为最大重量填充 1，为所有其他重量填充 0，但我还没有想出该怎么做。

Answer 1

我们可以通过操作创建一个组，并在具有'weight'

的max逻辑条件下创建二进制

library(dplyr)
df %>% 
     group_by(sp) %>% 
      mutate(main = +(weight == max(weight)))

-输出

# A tibble: 5 x 4
# Groups:   sp [3]
#  sp     weight   tag  main
#  <chr>   <int> <int> <int>
#1 green      70     1     1
#2 yellow     63     2     1
#3 red        41     3     1
#4 red        25     3     0
#5 red         9     3     0

或者在 base R 中，一个选项是首先 order 按 'weight' 降序排列数据，然后应用 duplicated

dfnew <- df[order(df$sp, -df$weight),]
dfnew$main <- +(!duplicated(dfnew$sp))

数据

df <- structure(list(sp = c("green", "yellow", "red", "red", "red"), 
    weight = c(70L, 63L, 41L, 25L, 9L), tag = c(1L, 2L, 3L, 3L, 
    3L)), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5"))

Answer 2

这个有用吗：

> library(dplyr)
> dat %>% left_join(dat %>% group_by(sp) %>% 
+                     filter(weight == max(weight)) %>% 
+                           mutate(main = 1) %>% select(X1, main), by = c('X1','sp')) %>% mutate(main = replace_na(main, 0))
Adding missing grouping variables: `sp`
# A tibble: 5 x 5
     X1 sp     weight   tag  main
  <dbl> <chr>   <dbl> <dbl> <dbl>
1     1 green      70     1     1
2     2 yellow     63     2     1
3     3 red        41     3     1
4     4 red        25     3     0
5     5 red         9     3     0
>

根据两列中的信息向我的数据框添加一列

add a column to my dataframe based on information in two columns

if-statement

r

unique

数据