仅平均重复的行并替换定义列中的值

Question

我有一个数据框D:

surname   name   salary
Red        A      1000
Green      B       900
Green      A      1100
Blue       C      1000
Blue       B      1000
Blue       F       800
Violet     F      1200

某些行在 surname 中没有重复，其他一些是。

我只需要聚合姓氏重复的行，计算工资的平均值并将名称更改为 "X"。

我尝试了一些使用 duplicated() 的东西，但它保留了一个副本作为原始副本并更改了其他副本。

D$name<-replace(D$name,duplicated(D$surname),"X")

而且我无法对 salary 的值进行平均。

谢谢！

Answer 1

我们可以使用

D$name <- replace(D$name,duplicated(D$surname)|duplicated(D$surname, 
          fromLast = TRUE),"X")

如果我们需要创建一个平均列

library(dplyr)
D %>% 
   group_by(surname) %>% 
   mutate(average = mean(salary))

数据

D <- structure(list(surname = c("Red", "Green", "Green", "Blue", "Blue", 
"Blue", "Violet"), name = c("A", "B", "A", "C", "B", "F", "F"
), salary = c(1000L, 900L, 1100L, 1000L, 1000L, 800L, 1200L)), class = "data.frame", row.names = c(NA, 
-7L))

仅平均重复的行并替换定义列中的值

Average only duplicated rows and replacing value in a defined column

aggregate

r

rename

duplicates

mean

数据