基于固定组和高值对列进行分组

grouping columns based fixed group and high values

I am trying to group the data table based on predefined groups which have overlap information. The groups and table are given below

I would like to 1) check if df$Type has all elements of either group1 or group2
2) check if df$Area of all elements of either group1 or group2 is higher than other elements
3) assign a new column and label them which group it belongs

```
# Groups 
     Group1 <- c("Bb","Ee","Xx")
     Group2 <- c("Ra","Xx")
# Data table 
     df     <- data.frame(
          ID=c(1,1,1,2,2,2,2),
          Type=c("Bb","Ee","Xx","Bb","Ra","Rb","Xx"), 
          Area=c(19,5,4,1,10,1,20)) 

```

以下是预期结果table

          ID Type Area Group
          1   Bb   19    G1
          1   Ee    5    G1
          1   Xx    4    G1
          2   Bb    1    G2
          2   Ra   10    G2
          2   Rb    1    G2
          2   Xx   20    G2

将组放入列表中。

library(dplyr)

ref <- list(Group1 = c("Bb","Ee","Xx"),
            Group2 = c("Ra","Xx"))

写一个满足你正在寻找的条件的函数和returns一个合适的组名。

get_group <- function(y, z) {
  Filter(function(x) {
    #1. check if Type has all elements of either group1 or group2
    if(all(x %in% y)) {
     inds  <- y %in% x
     if(any(!inds)) {
       #2. check if Area of all elements of either group1 or group2 
       #   is higher than other elements
       all(outer(z[inds], z[!inds], `>`))
     } else TRUE
    } else FALSE
    }, ref) -> res
  if(length(res) > 0) names(res)[1] else NA
}

为每个应用函数 ID :

df %>% group_by(ID) %>% mutate(Group = get_group(Type, Area)) %>% ungroup

#     ID Type   Area Group 
#  <dbl> <chr> <dbl> <chr> 
#1     1 Bb       19 Group1
#2     1 Ee        5 Group1
#3     1 Xx        4 Group1
#4     2 Bb        1 Group2
#5     2 Ra       10 Group2
#6     2 Rb        1 Group2
#7     2 Xx       20 Group2

您可以对 (1) 和 (3) 使用 %in%。问题 2 需要澄清。你是说求最大值吗?

问题 1

> df$Type %in% union(Group1, Group2)
[1]  TRUE  TRUE  TRUE  TRUE  TRUE FALSE  TRUE

问题 3

df[df$Type %in% Group1,]$Group <- "G1"
df[df$Type %in% Group2,]$Group <- "G2"
> df
  ID Type Area Group
1  1   Bb   19    G1
2  1   Ee    5    G1
3  1   Xx    4    G2
4  2   Bb    1    G1
5  2   Ra   10    G2
6  2   Rb    1  <NA>
7  2   Xx   20    G2