R return 第一个子串匹配
R return first substring match
我正在尝试对 R 数据帧的列中的字符串进行分类。具体来说,我希望执行以下操作:
- 遍历字符串列表
- 对于每个字符串,查看它是否是数据框列中的子字符串匹配
- 如果是,return子串第一个位置匹配对应的类别
例如,假设我有 dataframe1:
search_string = c('dan likes cake', 'molly likes cupcake', 'flanders likes berries')
我想搜索包含查找和类别的数据框
lookup_df =
lookups: cake, cupcake, berr
categories: dessert, small dessert, fruit
我想遍历 search_strings(它是数据框中的一列)和 return 以下内容:
'dan likes cake' --> dessert
'molly likes cupcake' --> small dessert
'flanders likes berries' --> fruit
现在我用嵌套循环低效地做这件事。
for (row in 1:nrow(search_string_df)){
search_string = #search string row
for (row_x in 1:nrow(lookup_df)){
# find first substring match in lookups
# create a new column in search_string_df with the associated category
}
}
这需要很长时间,我相信有更好的方法。我不精通 'apply' 和类似功能。我最熟悉 dplyr / tidyverse 语法。
使用tidyverse
:
pat <- str_c(lookup_df$lookups,collapse = '|')
data.frame(search_string) %>%
mutate(lookups = str_extract(search_string, pat)) %>%
left_join(lookup_df)
value lookups categories
<chr> <chr> <chr>
1 dan likes cake cake dessert
2 molly likes cupcake cupcake small dessert
3 flanders likes berries berr fruit
数据
lookup_df <- data.frame(
lookups = c('cake', 'cupcake', 'berr'),
categories= c('dessert', 'small dessert', 'fruit'))
search_string <- c("dan likes cake", "molly likes cupcake", "flanders likes berries")
我正在尝试对 R 数据帧的列中的字符串进行分类。具体来说,我希望执行以下操作:
- 遍历字符串列表
- 对于每个字符串,查看它是否是数据框列中的子字符串匹配
- 如果是,return子串第一个位置匹配对应的类别
例如,假设我有 dataframe1:
search_string = c('dan likes cake', 'molly likes cupcake', 'flanders likes berries')
我想搜索包含查找和类别的数据框
lookup_df =
lookups: cake, cupcake, berr
categories: dessert, small dessert, fruit
我想遍历 search_strings(它是数据框中的一列)和 return 以下内容:
'dan likes cake' --> dessert
'molly likes cupcake' --> small dessert
'flanders likes berries' --> fruit
现在我用嵌套循环低效地做这件事。
for (row in 1:nrow(search_string_df)){
search_string = #search string row
for (row_x in 1:nrow(lookup_df)){
# find first substring match in lookups
# create a new column in search_string_df with the associated category
}
}
这需要很长时间,我相信有更好的方法。我不精通 'apply' 和类似功能。我最熟悉 dplyr / tidyverse 语法。
使用tidyverse
:
pat <- str_c(lookup_df$lookups,collapse = '|')
data.frame(search_string) %>%
mutate(lookups = str_extract(search_string, pat)) %>%
left_join(lookup_df)
value lookups categories
<chr> <chr> <chr>
1 dan likes cake cake dessert
2 molly likes cupcake cupcake small dessert
3 flanders likes berries berr fruit
数据
lookup_df <- data.frame(
lookups = c('cake', 'cupcake', 'berr'),
categories= c('dessert', 'small dessert', 'fruit'))
search_string <- c("dan likes cake", "molly likes cupcake", "flanders likes berries")