通过在 R 中连接 2 列(包含逗号的字符串)来过滤数据
Filter the data by joining 2 columns (strings containing comma) in R
我有一个df
ID <- c('DX154','DX154','DX155','DX155','DX156','DX157','DX158','DX159')
Country <- c('US','US','US','US')
Level <- c('Level_1A','Level_1A','Level_1B','Level_1B','Level_1A','Level_1B','Level_1B','Level_1A')
Type_A <- c('Iphone','Iphone','Android','Android','aaa','bbb','ccc','ddd')
Type_B <- c("Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","ALL","ALL","ALL","ALL")
df <- data.frame(ID ,Country ,Level ,Type_A,Type_B)
df
ID Country Level Type_A Type_B
1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
2 DX154 US Level_1A Iphone Gmail,Android,Drive,Maps
3 DX155 US Level_1B Android Iphone,Ipad,Ipod,Mac
4 DX155 US Level_1B Android Gmail,Android,Drive,Maps
5 DX156 US Level_1A aaa ALL
6 DX157 US Level_1B bbb ALL
7 DX158 US Level_1B ccc ALL
8 DX159 US Level_1A ddd ALL
我正在尝试通过加入列 Type_A、Type_B 来归档此数据框,但不知道如何解析逗号。有人可以帮我解决这个问题吗?
我想要的输出是
ID Country Level Type_A Type_B
1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
2 DX155 US Level_1B Android Gmail,Android,Drive,Maps
3 DX156 US Level_1A aaa ALL
4 DX157 US Level_1B bbb ALL
5 DX158 US Level_1B ccc ALL
6 DX159 US Level_1A ddd ALL
这是一种解决方案。这有点花哨,但很快就会有人为您提供超级聪明和快速的版本。这是按行进行的,但 Akrun 的回答向您展示了如何仅通过 id 进行操作。
library(dplyr)
df <- df %>%
mutate(row_id = 1:n()) %>%
group_by(row_id) %>%
filter(grepl(Type_A, Type_B) | Type_B === "ALL")
我们按 'ID' 分组,使用 grepl
,通过 paste
ing 'Type_A' 列指定模式(在此示例中,使用 Type_A[1L]
应该也可以工作,因为 'Type_A' 元素是重复的。一个更好的例子会更好)并将其用于 filter
行。我们还使用 grepl
到 filter
'Type_B' 中那些从开始(^
)到结束($
)没有 ,
的元素字符串.
library(dplyr)
df %>%
group_by(ID) %>%
filter(grepl(paste(Type_A, collapse='|'),
Type_B)|grepl('^[^,]+$', Type_B))
# ID Country Level Type_A Type_B
#1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
#2 DX155 US Level_1B Android Gmail,Android,Drive,Maps
#3 DX156 US Level_1A aaa ALL
#4 DX157 US Level_1B bbb ALL
#5 DX158 US Level_1B ccc ALL
#6 DX159 US Level_1A ddd ALL
我有一个df
ID <- c('DX154','DX154','DX155','DX155','DX156','DX157','DX158','DX159')
Country <- c('US','US','US','US')
Level <- c('Level_1A','Level_1A','Level_1B','Level_1B','Level_1A','Level_1B','Level_1B','Level_1A')
Type_A <- c('Iphone','Iphone','Android','Android','aaa','bbb','ccc','ddd')
Type_B <- c("Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","ALL","ALL","ALL","ALL")
df <- data.frame(ID ,Country ,Level ,Type_A,Type_B)
df
ID Country Level Type_A Type_B
1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
2 DX154 US Level_1A Iphone Gmail,Android,Drive,Maps
3 DX155 US Level_1B Android Iphone,Ipad,Ipod,Mac
4 DX155 US Level_1B Android Gmail,Android,Drive,Maps
5 DX156 US Level_1A aaa ALL
6 DX157 US Level_1B bbb ALL
7 DX158 US Level_1B ccc ALL
8 DX159 US Level_1A ddd ALL
我正在尝试通过加入列 Type_A、Type_B 来归档此数据框,但不知道如何解析逗号。有人可以帮我解决这个问题吗?
我想要的输出是
ID Country Level Type_A Type_B
1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
2 DX155 US Level_1B Android Gmail,Android,Drive,Maps
3 DX156 US Level_1A aaa ALL
4 DX157 US Level_1B bbb ALL
5 DX158 US Level_1B ccc ALL
6 DX159 US Level_1A ddd ALL
这是一种解决方案。这有点花哨,但很快就会有人为您提供超级聪明和快速的版本。这是按行进行的,但 Akrun 的回答向您展示了如何仅通过 id 进行操作。
library(dplyr)
df <- df %>%
mutate(row_id = 1:n()) %>%
group_by(row_id) %>%
filter(grepl(Type_A, Type_B) | Type_B === "ALL")
我们按 'ID' 分组,使用 grepl
,通过 paste
ing 'Type_A' 列指定模式(在此示例中,使用 Type_A[1L]
应该也可以工作,因为 'Type_A' 元素是重复的。一个更好的例子会更好)并将其用于 filter
行。我们还使用 grepl
到 filter
'Type_B' 中那些从开始(^
)到结束($
)没有 ,
的元素字符串.
library(dplyr)
df %>%
group_by(ID) %>%
filter(grepl(paste(Type_A, collapse='|'),
Type_B)|grepl('^[^,]+$', Type_B))
# ID Country Level Type_A Type_B
#1 DX154 US Level_1A Iphone Iphone,Ipad,Ipod,Mac
#2 DX155 US Level_1B Android Gmail,Android,Drive,Maps
#3 DX156 US Level_1A aaa ALL
#4 DX157 US Level_1B bbb ALL
#5 DX158 US Level_1B ccc ALL
#6 DX159 US Level_1A ddd ALL