如何检查列中的任何字符串是否与另一个数据中列中的任何字符串匹配 table

Question

我有两个data.tables

fruit <- c("apple", "banana", "pear", "pineapple")
no <- sample(4L)
fruitDT <- data.table(fruit,no)

fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
takeThisOne <- sample(4L)
fruitDT2 <- data.table(fruit2,takeThisOne)

fruitDT

     fruit no
1:    apple  3
2:   banana  2
3:     pear  1
4: pineapple  4

fruitDT2

                fruit2 takeThisOne
1:    apple is a fruit           3
2:   orange is a color           4
3:        pear is pear           2
4: pine is also a tree           1

如果 fruit2 中的任何值与 fruitDT 中 fruit 列中的任何值匹配（部分），我想提取 takeThisOne 列的值。

预期结果

apple 3
banana NULL
pear 2
pineapple NULL

我打算在 str_detect 上结合使用 lapply 和 for 循环，但想知道是否存在更好的方法？

Answer 1

使用我的样本数据（因为它是随机的），

set.seed(42)
fruit <- c("apple", "banana", "pear", "pineapple")
# no <- sample(4L)
# fruitDT <- data.table(fruit,no)
# fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
# takeThisOne <- sample(4L)
# fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
#        fruit no
# 1:     apple  1
# 2:    banana  4
# 3:      pear  3
# 4: pineapple  2
fruitDT2
#                 fruit2 takeThisOne
# 1:    apple is a fruit           2
# 2:   orange is a color           4
# 3:        pear is pear           3
# 4: pine is also a tree           1

我认为这是正确的：

fuzzyjoin::regex_right_join(fruitDT2, fruitDT, by = c("fruit2" = "fruit"))[,c("fruit", "takeThisOne")]
#       fruit takeThisOne
# 1     apple           2
# 2    banana          NA
# 3      pear           3
# 4 pineapple          NA

Answer 2

对于每个 fruit，我们可以使用 grep 和 return fruitDT2 中匹配的第一个条目。

这是一种基本的 R 方法，但使用 data.table 语法，因为您已经有了一个语法。

library(data.table)

fruitDT[, TakeThisOne := sapply(fruit, function(x) 
                             fruitDT2$takeThisOne[grep(x, fruitDT2$fruit2)[1]])]
fruitDT

#       fruit no TakeThisOne
#1:     apple  3           3
#2:    banana  2          NA
#3:      pear  1           2
#4: pineapple  4          NA

如何检查列中的任何字符串是否与另一个数据中列中的任何字符串匹配 table

How to check if any of the strings in a column match with any string in column in another data table

r

stringr