对于 r 中不同数量的变量,除了具有 group_by 的特定条件之外的所有行的子集
subset all rows except those with a specific condition with group_by for varying number of variables in r
我想过滤这个df
Sample <- c(1:24)
Group <- c("A","A","A","A","A","A","A","A","A","A","A","A", "B","B","B","B","B","B","B","B","B","B","B","B")
T1 <- c(74.4, 74.7, 74.1, 72.2, 72.8, 72.9, 70.8, 71.2, 70.5, 72.4, 72.7, 72.1, 71.2, 71.8, 71.9, 70.8, 70.2, 70.5, 72.2, 72.7, 72.1, 70.8, 71.0, 70.7)
S1 <- c("sample", "sample", "sample", "std", "std","std","std","std", "std", "sample", "sample", "sample","sample", "sample", "sample", "std", "std","std", "std", "std", "sample", "sample", "sample", "sample")
df <- data.frame(Sample, Group, T1, S1)
保留所有行,除了那些
S1=="std" & Group == "A" & T1 %!+-1% median(T1[S1 == "std"])
每个 Group
得到这个输出
Sample Group T1 S1
1 1 A 74.4 sample
2 2 A 74.7 sample
3 3 A 74.1 sample
4 4 A 72.2 std
7 7 A 70.8 std
8 8 A 71.2 std
10 10 A 72.4 sample
11 11 A 72.7 sample
12 12 A 72.1 sample
13 13 B 71.2 sample
14 14 B 71.8 sample
15 15 B 71.9 sample
16 16 B 70.8 std
17 17 B 70.2 std
18 18 B 70.5 std
21 21 B 72.1 sample
22 22 B 70.8 sample
23 23 B 71.0 sample
24 24 B 70.7 sample
>
我得到了这个漂亮代码的帮助
df %>% group_by(Group) %>% filter(T1 %+-1% median(T1[S1 == "std"]))
过滤所有行(不仅是 S1 == "std"
),但我无法绕过它来实现 subset
函数,因此我删除了符合这些条件的行。
我仍然这样做 - 据我所知这不是正确的方法,而且它不允许我对不同数量的组(如果超过 2 个)这样做
for(Var in unique(df$Group)) {
assign(paste("T1_", Var, sep = ""), median(filter(df, Group == Var, S1 == "std")$T1))
}
`%+-1%` <- function(T1, T1_A) (T1 >= T1_A-1) & (T1 <= T1_A+1)
df %>% subset(!(df$S1=="std" & df$Group == "A" & df$T1 %!+-1% T1_A |
df$S1=="std" & df$Group == "B" & df$T1 %!+-1% T1_B))
这将删除每个 Group
中的行,其中 S1=="std"
和 Group == "A"
和 T1
值在 median
的 +- 1% 之间14=] 其中 S1 == "std"
.
library(dplyr)
df %>%
group_by(Group) %>%
filter({
val <- median(T1[S1 == "std"])
!(S1=="std" & T1 %!+-1% val)
}) %>%
ungroup
我想过滤这个df
Sample <- c(1:24)
Group <- c("A","A","A","A","A","A","A","A","A","A","A","A", "B","B","B","B","B","B","B","B","B","B","B","B")
T1 <- c(74.4, 74.7, 74.1, 72.2, 72.8, 72.9, 70.8, 71.2, 70.5, 72.4, 72.7, 72.1, 71.2, 71.8, 71.9, 70.8, 70.2, 70.5, 72.2, 72.7, 72.1, 70.8, 71.0, 70.7)
S1 <- c("sample", "sample", "sample", "std", "std","std","std","std", "std", "sample", "sample", "sample","sample", "sample", "sample", "std", "std","std", "std", "std", "sample", "sample", "sample", "sample")
df <- data.frame(Sample, Group, T1, S1)
保留所有行,除了那些
S1=="std" & Group == "A" & T1 %!+-1% median(T1[S1 == "std"])
每个 Group
得到这个输出
Sample Group T1 S1
1 1 A 74.4 sample
2 2 A 74.7 sample
3 3 A 74.1 sample
4 4 A 72.2 std
7 7 A 70.8 std
8 8 A 71.2 std
10 10 A 72.4 sample
11 11 A 72.7 sample
12 12 A 72.1 sample
13 13 B 71.2 sample
14 14 B 71.8 sample
15 15 B 71.9 sample
16 16 B 70.8 std
17 17 B 70.2 std
18 18 B 70.5 std
21 21 B 72.1 sample
22 22 B 70.8 sample
23 23 B 71.0 sample
24 24 B 70.7 sample
>
我得到了这个漂亮代码的帮助
df %>% group_by(Group) %>% filter(T1 %+-1% median(T1[S1 == "std"]))
过滤所有行(不仅是 S1 == "std"
),但我无法绕过它来实现 subset
函数,因此我删除了符合这些条件的行。
我仍然这样做 - 据我所知这不是正确的方法,而且它不允许我对不同数量的组(如果超过 2 个)这样做
for(Var in unique(df$Group)) {
assign(paste("T1_", Var, sep = ""), median(filter(df, Group == Var, S1 == "std")$T1))
}
`%+-1%` <- function(T1, T1_A) (T1 >= T1_A-1) & (T1 <= T1_A+1)
df %>% subset(!(df$S1=="std" & df$Group == "A" & df$T1 %!+-1% T1_A |
df$S1=="std" & df$Group == "B" & df$T1 %!+-1% T1_B))
这将删除每个 Group
中的行,其中 S1=="std"
和 Group == "A"
和 T1
值在 median
的 +- 1% 之间14=] 其中 S1 == "std"
.
library(dplyr)
df %>%
group_by(Group) %>%
filter({
val <- median(T1[S1 == "std"])
!(S1=="std" & T1 %!+-1% val)
}) %>%
ungroup