对列表中的数字进行分组和比较
Group and compare numbers in a list
考虑在两个列表
上进行 cbind
操作后获得的以下数据框
> fl
x meanlist
1 1 48.5
2 2 32.5
3 3 28.0
4 4 27.0
5 5 25.5
6 6 20.5
7 7 27.0
8 8 24.0
class_median <- list(0, 15, 25, 35, 45)
class_list <- list(0:10, 10:20, 20:30, 30:40, 40:50)
class_median
中的值表示class是-10到+10、10到20、20到30等
首先,我尝试根据 class_list
中的 class 对 fl$meanlist
中的值进行分组。其次,我正在尝试 return 每个 class 一个最接近中值的值,如下所示
> fl_subset
x meanlist cm
1 1 48.5 45
2 2 32.5 35
3 5 25.5 25
我正在尝试使用循环进行比较,但它似乎很长且难以管理,而且结果不正确
这是 dplyr
的方法:
library(dplyr)
# do a little prep--name classes, extract breaks, put medians in a data frame
names(class_list) = letters[seq_along(class_list)]
breaks = c(min(class_list[[1]]), sapply(class_list, max))
med_data = data.frame(median = unlist(class_median), class = names(class_list))
fl %>%
# assign classes
mutate(class = cut(meanlist, breaks = breaks, labels = names(class_list))) %>%
# get medians
left_join(med_data) %>%
# within each class...
group_by(class) %>%
# keep the row with the smallest absolute difference to the median
slice(which.min(abs(meanlist - median))) %>%
# sort in original order
arrange(x)
# Joining, by = "class"
# # A tibble: 3 x 4
# # Groups: class [3]
# x meanlist class median
# <int> <dbl> <fct> <dbl>
# 1 1 48.5 e 45
# 2 2 32.5 d 35
# 3 5 25.5 c 25
利用 purrr
和 dplyr
的一种方法可能是:
map2(.x = class_list,
.y = class_median,
~ fl %>%
mutate(cm = between(meanlist, min(.x), max(.x))) %>%
filter(any(cm)) %>%
mutate(cm = cm*.y)) %>%
bind_rows(.id = "ID") %>%
group_by(ID) %>%
slice(which.min(abs(meanlist-cm)))
ID x meanlist cm
<chr> <int> <dbl> <dbl>
1 3 5 25.5 25
2 4 2 32.5 35
3 5 1 48.5 45
考虑在两个列表
上进行cbind
操作后获得的以下数据框
> fl
x meanlist
1 1 48.5
2 2 32.5
3 3 28.0
4 4 27.0
5 5 25.5
6 6 20.5
7 7 27.0
8 8 24.0
class_median <- list(0, 15, 25, 35, 45)
class_list <- list(0:10, 10:20, 20:30, 30:40, 40:50)
class_median
中的值表示class是-10到+10、10到20、20到30等
首先,我尝试根据 class_list
中的 class 对 fl$meanlist
中的值进行分组。其次,我正在尝试 return 每个 class 一个最接近中值的值,如下所示
> fl_subset
x meanlist cm
1 1 48.5 45
2 2 32.5 35
3 5 25.5 25
我正在尝试使用循环进行比较,但它似乎很长且难以管理,而且结果不正确
这是 dplyr
的方法:
library(dplyr)
# do a little prep--name classes, extract breaks, put medians in a data frame
names(class_list) = letters[seq_along(class_list)]
breaks = c(min(class_list[[1]]), sapply(class_list, max))
med_data = data.frame(median = unlist(class_median), class = names(class_list))
fl %>%
# assign classes
mutate(class = cut(meanlist, breaks = breaks, labels = names(class_list))) %>%
# get medians
left_join(med_data) %>%
# within each class...
group_by(class) %>%
# keep the row with the smallest absolute difference to the median
slice(which.min(abs(meanlist - median))) %>%
# sort in original order
arrange(x)
# Joining, by = "class"
# # A tibble: 3 x 4
# # Groups: class [3]
# x meanlist class median
# <int> <dbl> <fct> <dbl>
# 1 1 48.5 e 45
# 2 2 32.5 d 35
# 3 5 25.5 c 25
利用 purrr
和 dplyr
的一种方法可能是:
map2(.x = class_list,
.y = class_median,
~ fl %>%
mutate(cm = between(meanlist, min(.x), max(.x))) %>%
filter(any(cm)) %>%
mutate(cm = cm*.y)) %>%
bind_rows(.id = "ID") %>%
group_by(ID) %>%
slice(which.min(abs(meanlist-cm)))
ID x meanlist cm
<chr> <int> <dbl> <dbl>
1 3 5 25.5 25
2 4 2 32.5 35
3 5 1 48.5 45