Select 每组中的行基于它们的值
Select rows in each group based on their values
我有一个如下所示的数据框:
Site=c("Site1","Site1","Site1", "Site2","Site2","Site2")
Gender=c("Male","Male","Male","Female","Male","Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron")
Result=c(12,22,61,14,52,11)
df=data.frame(Site,Gender,Tissue,Element,Result)
> df
Site Gender Tissue Element Result
1 Site1 Male Muscle Iron 12
2 Site1 Male Muscle Iron 22
3 Site1 Male Muscle Humid 61
4 Site2 Female Muscle Iron 14
5 Site2 Male WB Humid 52
6 Site2 Female Muscle Iron 11
第一步
我想根据地点、性别和组织对我的数据进行分组。然后,在每个组中,我想找到那些同名元素具有更高结果的行。
例如,
第 1 组:Site1 男性肌肉
Group2:Site2 女性肌肉
第 3 组:站点 2 男 WB
现在,第 1 组中:
Site1 男肌肉铁12
Site1 男肌肉铁22
铁是两行中的同一元素。我想选择 Iron's Result 更大的行,即
Site1男肌铁22
然后我想在数据框中添加另一列,假设为“Col6”,并将最大的结果量放在那里。所以我的数据看起来像这样:
Site Gender Tissue Element Result Col6
1 Site1 Male Muscle Iron 12 NA
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA
第二步
在此之后,我想将每个元素结果最低的行乘以“潮湿”的结果。
例如,在第 1 组中:
Site1 男肌肉铁12
Site1 男肌肉铁22
铁是两行中的相同元素,第一行的结果较低 12:
Site1男肌铁12
我想将 12 乘以该组中的湿度结果 61:
Site1 男性肌肉 潮湿 61.
然后将这个量 (12*61=732) 添加到 Iron(不是 Humid)前面的 Col6,这样我的最终 table 看起来像这样:
Site Gender Tissue Element Result Col6
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA
注意:请注意,我有数十种站点和元素类型,每个组总是有两行相同的元素,可以在较低和较高的结果值之间进行选择。
假设每个组最多有一个 Humid 行(否则它会占用该 humid 组的最大值),请遵循此代码。为便于解释,添加了一个单独的列 dummy
。另外我多了一排(site2, Female, Muscle) 为了更好的演示。
#revised sample
Site=c("Site1","Site1","Site1", "Site2","Site2","Site2", "Site2")
Gender=c("Male","Male","Male","Female","Male","Female", "Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle", "Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron", "Humid")
Result=c(12,22,61,14,52,11, 50)
df=data.frame(Site,Gender,Tissue,Element,Result)
> df
Site Gender Tissue Element Result
1 Site1 Male Muscle Iron 12
2 Site1 Male Muscle Iron 22
3 Site1 Male Muscle Humid 61
4 Site2 Female Muscle Iron 14
5 Site2 Male WB Humid 52
6 Site2 Female Muscle Iron 11
7 Site2 Female Muscle Humid 50
代码
library(dplyr)
df %>% mutate(rowid = row_number()) %>%
group_by(Site, Gender, Tissue, Element) %>%
mutate(dummy = case_when(Element != "Humid" & Result == max(Result) ~ "Max_E",
Element != "Humid" & Result != max(Result) ~ "Other_E",
Element == "Humid" & Result == max(Result) ~ "AA_Max_H",
TRUE ~ "Other_H")) %>%
ungroup(Element) %>% arrange(Site, Gender, Tissue, dummy) %>%
mutate(col6 = case_when(dummy == "Max_E" ~ Result,
dummy == "Other_E" ~ Result * first(Result[dummy == "AA_Max_H"]),
TRUE ~ NA_real_)) %>%
ungroup() %>% arrange(rowid) %>%
select(-rowid, -dummy)
# A tibble: 7 x 6
Site Gender Tissue Element Result col6
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 550
7 Site2 Female Muscle Humid 50 NA
对于OP发布的示例数据,它给出了准确的结果
# A tibble: 6 x 6
Site Gender Tissue Element Result col6
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA
我有一个如下所示的数据框:
Site=c("Site1","Site1","Site1", "Site2","Site2","Site2")
Gender=c("Male","Male","Male","Female","Male","Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron")
Result=c(12,22,61,14,52,11)
df=data.frame(Site,Gender,Tissue,Element,Result)
> df
Site Gender Tissue Element Result
1 Site1 Male Muscle Iron 12
2 Site1 Male Muscle Iron 22
3 Site1 Male Muscle Humid 61
4 Site2 Female Muscle Iron 14
5 Site2 Male WB Humid 52
6 Site2 Female Muscle Iron 11
第一步
我想根据地点、性别和组织对我的数据进行分组。然后,在每个组中,我想找到那些同名元素具有更高结果的行。
例如,
第 1 组:Site1 男性肌肉
Group2:Site2 女性肌肉
第 3 组:站点 2 男 WB
现在,第 1 组中:
Site1 男肌肉铁12
Site1 男肌肉铁22
铁是两行中的同一元素。我想选择 Iron's Result 更大的行,即
Site1男肌铁22
然后我想在数据框中添加另一列,假设为“Col6”,并将最大的结果量放在那里。所以我的数据看起来像这样:
Site Gender Tissue Element Result Col6
1 Site1 Male Muscle Iron 12 NA
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA
第二步
在此之后,我想将每个元素结果最低的行乘以“潮湿”的结果。
例如,在第 1 组中:
Site1 男肌肉铁12
Site1 男肌肉铁22
铁是两行中的相同元素,第一行的结果较低 12:
Site1男肌铁12
我想将 12 乘以该组中的湿度结果 61:
Site1 男性肌肉 潮湿 61.
然后将这个量 (12*61=732) 添加到 Iron(不是 Humid)前面的 Col6,这样我的最终 table 看起来像这样:
Site Gender Tissue Element Result Col6
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA
注意:请注意,我有数十种站点和元素类型,每个组总是有两行相同的元素,可以在较低和较高的结果值之间进行选择。
假设每个组最多有一个 Humid 行(否则它会占用该 humid 组的最大值),请遵循此代码。为便于解释,添加了一个单独的列 dummy
。另外我多了一排(site2, Female, Muscle) 为了更好的演示。
#revised sample
Site=c("Site1","Site1","Site1", "Site2","Site2","Site2", "Site2")
Gender=c("Male","Male","Male","Female","Male","Female", "Female")
Tissue=c("Muscle","Muscle","Muscle","Muscle","WB","Muscle", "Muscle")
Element=c("Iron","Iron","Humid","Iron","Humid","Iron", "Humid")
Result=c(12,22,61,14,52,11, 50)
df=data.frame(Site,Gender,Tissue,Element,Result)
> df
Site Gender Tissue Element Result
1 Site1 Male Muscle Iron 12
2 Site1 Male Muscle Iron 22
3 Site1 Male Muscle Humid 61
4 Site2 Female Muscle Iron 14
5 Site2 Male WB Humid 52
6 Site2 Female Muscle Iron 11
7 Site2 Female Muscle Humid 50
代码
library(dplyr)
df %>% mutate(rowid = row_number()) %>%
group_by(Site, Gender, Tissue, Element) %>%
mutate(dummy = case_when(Element != "Humid" & Result == max(Result) ~ "Max_E",
Element != "Humid" & Result != max(Result) ~ "Other_E",
Element == "Humid" & Result == max(Result) ~ "AA_Max_H",
TRUE ~ "Other_H")) %>%
ungroup(Element) %>% arrange(Site, Gender, Tissue, dummy) %>%
mutate(col6 = case_when(dummy == "Max_E" ~ Result,
dummy == "Other_E" ~ Result * first(Result[dummy == "AA_Max_H"]),
TRUE ~ NA_real_)) %>%
ungroup() %>% arrange(rowid) %>%
select(-rowid, -dummy)
# A tibble: 7 x 6
Site Gender Tissue Element Result col6
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 550
7 Site2 Female Muscle Humid 50 NA
对于OP发布的示例数据,它给出了准确的结果
# A tibble: 6 x 6
Site Gender Tissue Element Result col6
<chr> <chr> <chr> <chr> <dbl> <dbl>
1 Site1 Male Muscle Iron 12 732
2 Site1 Male Muscle Iron 22 22
3 Site1 Male Muscle Humid 61 NA
4 Site2 Female Muscle Iron 14 14
5 Site2 Male WB Humid 52 NA
6 Site2 Female Muscle Iron 11 NA