基于固定组和高值对列进行分组
grouping columns based fixed group and high values
I am trying to group the data table based on predefined groups which have overlap information.
The groups and table are given below
I would like to 1) check if df$Type has all elements of either group1 or group2
2) check if df$Area of all elements of either group1 or group2 is higher than other elements
3) assign a new column and label them which group it belongs
```
# Groups
Group1 <- c("Bb","Ee","Xx")
Group2 <- c("Ra","Xx")
# Data table
df <- data.frame(
ID=c(1,1,1,2,2,2,2),
Type=c("Bb","Ee","Xx","Bb","Ra","Rb","Xx"),
Area=c(19,5,4,1,10,1,20))
```
以下是预期结果table
ID Type Area Group
1 Bb 19 G1
1 Ee 5 G1
1 Xx 4 G1
2 Bb 1 G2
2 Ra 10 G2
2 Rb 1 G2
2 Xx 20 G2
将组放入列表中。
library(dplyr)
ref <- list(Group1 = c("Bb","Ee","Xx"),
Group2 = c("Ra","Xx"))
写一个满足你正在寻找的条件的函数和returns一个合适的组名。
get_group <- function(y, z) {
Filter(function(x) {
#1. check if Type has all elements of either group1 or group2
if(all(x %in% y)) {
inds <- y %in% x
if(any(!inds)) {
#2. check if Area of all elements of either group1 or group2
# is higher than other elements
all(outer(z[inds], z[!inds], `>`))
} else TRUE
} else FALSE
}, ref) -> res
if(length(res) > 0) names(res)[1] else NA
}
为每个应用函数 ID
:
df %>% group_by(ID) %>% mutate(Group = get_group(Type, Area)) %>% ungroup
# ID Type Area Group
# <dbl> <chr> <dbl> <chr>
#1 1 Bb 19 Group1
#2 1 Ee 5 Group1
#3 1 Xx 4 Group1
#4 2 Bb 1 Group2
#5 2 Ra 10 Group2
#6 2 Rb 1 Group2
#7 2 Xx 20 Group2
您可以对 (1) 和 (3) 使用 %in%
。问题 2 需要澄清。你是说求最大值吗?
问题 1
> df$Type %in% union(Group1, Group2)
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE
问题 3
df[df$Type %in% Group1,]$Group <- "G1"
df[df$Type %in% Group2,]$Group <- "G2"
> df
ID Type Area Group
1 1 Bb 19 G1
2 1 Ee 5 G1
3 1 Xx 4 G2
4 2 Bb 1 G1
5 2 Ra 10 G2
6 2 Rb 1 <NA>
7 2 Xx 20 G2
I am trying to group the data table based on predefined groups which have overlap information. The groups and table are given below
I would like to 1) check if df$Type has all elements of either group1 or group2
2) check if df$Area of all elements of either group1 or group2 is higher than other elements
3) assign a new column and label them which group it belongs
```
# Groups
Group1 <- c("Bb","Ee","Xx")
Group2 <- c("Ra","Xx")
# Data table
df <- data.frame(
ID=c(1,1,1,2,2,2,2),
Type=c("Bb","Ee","Xx","Bb","Ra","Rb","Xx"),
Area=c(19,5,4,1,10,1,20))
```
以下是预期结果table
ID Type Area Group
1 Bb 19 G1
1 Ee 5 G1
1 Xx 4 G1
2 Bb 1 G2
2 Ra 10 G2
2 Rb 1 G2
2 Xx 20 G2
将组放入列表中。
library(dplyr)
ref <- list(Group1 = c("Bb","Ee","Xx"),
Group2 = c("Ra","Xx"))
写一个满足你正在寻找的条件的函数和returns一个合适的组名。
get_group <- function(y, z) {
Filter(function(x) {
#1. check if Type has all elements of either group1 or group2
if(all(x %in% y)) {
inds <- y %in% x
if(any(!inds)) {
#2. check if Area of all elements of either group1 or group2
# is higher than other elements
all(outer(z[inds], z[!inds], `>`))
} else TRUE
} else FALSE
}, ref) -> res
if(length(res) > 0) names(res)[1] else NA
}
为每个应用函数 ID
:
df %>% group_by(ID) %>% mutate(Group = get_group(Type, Area)) %>% ungroup
# ID Type Area Group
# <dbl> <chr> <dbl> <chr>
#1 1 Bb 19 Group1
#2 1 Ee 5 Group1
#3 1 Xx 4 Group1
#4 2 Bb 1 Group2
#5 2 Ra 10 Group2
#6 2 Rb 1 Group2
#7 2 Xx 20 Group2
您可以对 (1) 和 (3) 使用 %in%
。问题 2 需要澄清。你是说求最大值吗?
问题 1
> df$Type %in% union(Group1, Group2)
[1] TRUE TRUE TRUE TRUE TRUE FALSE TRUE
问题 3
df[df$Type %in% Group1,]$Group <- "G1"
df[df$Type %in% Group2,]$Group <- "G2"
> df
ID Type Area Group
1 1 Bb 19 G1
2 1 Ee 5 G1
3 1 Xx 4 G2
4 2 Bb 1 G1
5 2 Ra 10 G2
6 2 Rb 1 <NA>
7 2 Xx 20 G2