用 dplyr 中的数学条件进行总结
Summarize with mathematical conditions in dplyr
基于这个问题:Summarize with conditions in dplyr
我想使用 dplyr
来根据数学条件汇总列(而不是链接 post 中的字符串匹配)。我需要在 measurement
/time
的比率最高时找到最大值 measurement
,同时创建一个新列 ratio
。我还想遍历整行,我不确定如何使用 dplyr
的 summarize
函数。
示例数据框
print(df)
sample type time measurement
1 a bacteria 24 0.57561
2 a bacteria 44 1.67236
3 a bacteria 67 4.17100
4 a bacteria 88 11.51661
5 b bacteria 24 0.53269
6 b bacteria 44 1.24942
7 b bacteria 67 5.72147
8 b bacteria 88 11.04017
9 c bacteria 0 0.00000
10 c bacteria 24 0.47418
11 c bacteria 39 1.06286
12 c bacteria 64 3.59649
13 c bacteria 78 7.05190
14 c bacteria 108 7.27060
期望输出
sample type time measurement ratio
1 a bacteria 88 11.51661 0.13087057
2 b bacteria 88 11.04017 0.12545648
3 c bacteria 78 7.05190 0.09040897
尝试失败
这只是returns由group_by
和summarize
函数定义的两列,希望将整行信息带入:
library(dplyr)
df %>%
group_by(sample) %>%
summarize(ratio = max(measurement/time, na.rm = TRUE))
sample ratio
<fct> <dbl>
1 a 0.131
2 b 0.125
3 c 0.0904
可重现数据
structure(list(sample = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"),
type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "bacteria", class = "factor"),
time = c(24, 44, 67, 88, 24, 44, 67, 88, 0, 24, 39, 64, 78,
108), measurement = c(0.57561, 1.67236, 4.171, 11.51661,
0.53269, 1.24942, 5.72147, 11.04017, 0, 0.47418, 1.06286,
3.59649, 7.0519, 7.2706)), class = "data.frame", row.names = c(NA,
-14L))
df %>%
mutate(ratio = measurement/time) %>%
group_by(sample) %>%
filter(ratio == max(ratio, na.rm=TRUE))
这应该可以解决问题。
df %>%
group_by(sample) %>%
mutate(ratio = measurement/time) %>%
filter(ratio == max(ratio))
一个选项是 filter
'measurement' 基于 measurement/time 的 max
位置,并用它来比较 (==
) 与 'measurement' 按 'sample'
分组后的值
library(dplyr)
df %>%
group_by(sample) %>%
mutate(ratio = measurement/time) %>%
filter(measurement == measurement[which.max(ratio)])
基于这个问题:Summarize with conditions in dplyr
我想使用 dplyr
来根据数学条件汇总列(而不是链接 post 中的字符串匹配)。我需要在 measurement
/time
的比率最高时找到最大值 measurement
,同时创建一个新列 ratio
。我还想遍历整行,我不确定如何使用 dplyr
的 summarize
函数。
示例数据框
print(df)
sample type time measurement
1 a bacteria 24 0.57561
2 a bacteria 44 1.67236
3 a bacteria 67 4.17100
4 a bacteria 88 11.51661
5 b bacteria 24 0.53269
6 b bacteria 44 1.24942
7 b bacteria 67 5.72147
8 b bacteria 88 11.04017
9 c bacteria 0 0.00000
10 c bacteria 24 0.47418
11 c bacteria 39 1.06286
12 c bacteria 64 3.59649
13 c bacteria 78 7.05190
14 c bacteria 108 7.27060
期望输出
sample type time measurement ratio
1 a bacteria 88 11.51661 0.13087057
2 b bacteria 88 11.04017 0.12545648
3 c bacteria 78 7.05190 0.09040897
尝试失败
这只是returns由group_by
和summarize
函数定义的两列,希望将整行信息带入:
library(dplyr)
df %>%
group_by(sample) %>%
summarize(ratio = max(measurement/time, na.rm = TRUE))
sample ratio
<fct> <dbl>
1 a 0.131
2 b 0.125
3 c 0.0904
可重现数据
structure(list(sample = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("a", "b", "c"), class = "factor"),
type = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L), .Label = "bacteria", class = "factor"),
time = c(24, 44, 67, 88, 24, 44, 67, 88, 0, 24, 39, 64, 78,
108), measurement = c(0.57561, 1.67236, 4.171, 11.51661,
0.53269, 1.24942, 5.72147, 11.04017, 0, 0.47418, 1.06286,
3.59649, 7.0519, 7.2706)), class = "data.frame", row.names = c(NA,
-14L))
df %>%
mutate(ratio = measurement/time) %>%
group_by(sample) %>%
filter(ratio == max(ratio, na.rm=TRUE))
这应该可以解决问题。
df %>%
group_by(sample) %>%
mutate(ratio = measurement/time) %>%
filter(ratio == max(ratio))
一个选项是 filter
'measurement' 基于 measurement/time 的 max
位置,并用它来比较 (==
) 与 'measurement' 按 'sample'
library(dplyr)
df %>%
group_by(sample) %>%
mutate(ratio = measurement/time) %>%
filter(measurement == measurement[which.max(ratio)])