Select 每组中的行基于它们的值

Select rows in each group based on their values

我有一个如下所示的数据框:

Site=c("Site1","Site1","Site1", "Site2","Site2","Site2")
Gender=c("Male","Male","Male","Female","Male","Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron")
Result=c(12,22,61,14,52,11)

df=data.frame(Site,Gender,Tissue,Element,Result)

> df

   Site Gender Tissue Element Result
1 Site1   Male Muscle    Iron     12
2 Site1   Male Muscle    Iron     22
3 Site1   Male Muscle   Humid     61
4 Site2 Female Muscle    Iron     14
5 Site2   Male     WB   Humid     52
6 Site2 Female Muscle    Iron     11

第一步

我想根据地点、性别和组织对我的数据进行分组。然后,在每个组中,我想找到那些同名元素具有更高结果的行。

例如,

第 1 组:Site1 男性肌肉

Group2:Site2 女性肌肉

第 3 组:站点 2 男 WB

现在,第 1 组中:

Site1 男肌肉12

Site1 男肌肉22

铁是两行中的同一元素。我想选择 Iron's Result 更大的行,即

Site1男肌铁22

然后我想在数据框中添加另一列,假设为“Col6”,并将最大的结果量放在那里。所以我的数据看起来像这样:

   Site Gender Tissue Element Result Col6
1 Site1   Male Muscle    Iron     12   NA 
2 Site1   Male Muscle    Iron     22   22
3 Site1   Male Muscle   Humid     61   NA
4 Site2 Female Muscle    Iron     14   14
5 Site2   Male     WB   Humid     52   NA
6 Site2 Female Muscle    Iron     11   NA

第二步

在此之后,我想将每个元素结果最低的行乘以“潮湿”的结果。

例如,在第 1 组中:

Site1 男肌肉12

Site1 男肌肉22

铁是两行中的相同元素,第一行的结果较低 12:

Site1男肌铁12

我想将 12 乘以该组中的湿度结果 61:

Site1 男性肌肉 潮湿 61.

然后将这个量 (12*61=732) 添加到 Iron(不是 Humid)前面的 Col6,这样我的最终 table 看起来像这样:

   Site Gender Tissue Element Result Col6
1 Site1   Male Muscle    Iron     12  732
2 Site1   Male Muscle    Iron     22   22
3 Site1   Male Muscle   Humid     61   NA
4 Site2 Female Muscle    Iron     14   14
5 Site2   Male     WB   Humid     52   NA
6 Site2 Female Muscle    Iron     11   NA

注意:请注意,我有数十种站点和元素类型,每个组总是有两行相同的元素,可以在较低和较高的结果值之间进行选择。

假设每个组最多有一个 Humid 行(否则它会占用该 humid 组的最大值),请遵循此代码。为便于解释,添加了一个单独的列 dummy。另外我多了一排(site2, Female, Muscle) 为了更好的演示。

#revised sample

Site=c("Site1","Site1","Site1", "Site2","Site2","Site2", "Site2")
Gender=c("Male","Male","Male","Female","Male","Female", "Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle", "Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron", "Humid")
Result=c(12,22,61,14,52,11, 50)

df=data.frame(Site,Gender,Tissue,Element,Result)

> df
   Site Gender Tissue Element Result
1 Site1   Male Muscle    Iron     12
2 Site1   Male Muscle    Iron     22
3 Site1   Male Muscle   Humid     61
4 Site2 Female Muscle    Iron     14
5 Site2   Male     WB   Humid     52
6 Site2 Female Muscle    Iron     11
7 Site2 Female Muscle   Humid     50

代码

library(dplyr)

df %>% mutate(rowid = row_number()) %>%
  group_by(Site, Gender, Tissue, Element) %>%
  mutate(dummy = case_when(Element != "Humid" & Result == max(Result) ~ "Max_E",
                           Element != "Humid" & Result != max(Result) ~ "Other_E",
                           Element == "Humid" & Result == max(Result) ~ "AA_Max_H",
                           TRUE ~ "Other_H")) %>%
  ungroup(Element) %>% arrange(Site, Gender, Tissue, dummy) %>%
  mutate(col6 = case_when(dummy == "Max_E" ~ Result,
                          dummy == "Other_E" ~ Result * first(Result[dummy == "AA_Max_H"]),
                          TRUE ~ NA_real_)) %>%
  ungroup() %>% arrange(rowid) %>%
  select(-rowid, -dummy)

# A tibble: 7 x 6
  Site  Gender Tissue Element Result  col6
  <chr> <chr>  <chr>  <chr>    <dbl> <dbl>
1 Site1 Male   Muscle Iron        12   732
2 Site1 Male   Muscle Iron        22    22
3 Site1 Male   Muscle Humid       61    NA
4 Site2 Female Muscle Iron        14    14
5 Site2 Male   WB     Humid       52    NA
6 Site2 Female Muscle Iron        11   550
7 Site2 Female Muscle Humid       50    NA

对于OP发布的示例数据,它给出了准确的结果

# A tibble: 6 x 6
  Site  Gender Tissue Element Result  col6
  <chr> <chr>  <chr>  <chr>    <dbl> <dbl>
1 Site1 Male   Muscle Iron        12   732
2 Site1 Male   Muscle Iron        22    22
3 Site1 Male   Muscle Humid       61    NA
4 Site2 Female Muscle Iron        14    14
5 Site2 Male   WB     Humid       52    NA
6 Site2 Female Muscle Iron        11    NA