如何检查列中的任何字符串是否与另一个数据中列中的任何字符串匹配 table
How to check if any of the strings in a column match with any string in column in another data table
我有两个data.tables
fruit <- c("apple", "banana", "pear", "pineapple")
no <- sample(4L)
fruitDT <- data.table(fruit,no)
fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
takeThisOne <- sample(4L)
fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
fruit no
1: apple 3
2: banana 2
3: pear 1
4: pineapple 4
fruitDT2
fruit2 takeThisOne
1: apple is a fruit 3
2: orange is a color 4
3: pear is pear 2
4: pine is also a tree 1
如果 fruit2 中的任何值与 fruitDT 中 fruit 列中的任何值匹配(部分),我想提取 takeThisOne 列的值。
预期结果
apple 3
banana NULL
pear 2
pineapple NULL
我打算在 str_detect 上结合使用 lapply 和 for 循环,但想知道是否存在更好的方法?
使用我的样本数据(因为它是随机的),
set.seed(42)
fruit <- c("apple", "banana", "pear", "pineapple")
# no <- sample(4L)
# fruitDT <- data.table(fruit,no)
# fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
# takeThisOne <- sample(4L)
# fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
# fruit no
# 1: apple 1
# 2: banana 4
# 3: pear 3
# 4: pineapple 2
fruitDT2
# fruit2 takeThisOne
# 1: apple is a fruit 2
# 2: orange is a color 4
# 3: pear is pear 3
# 4: pine is also a tree 1
我认为这是正确的:
fuzzyjoin::regex_right_join(fruitDT2, fruitDT, by = c("fruit2" = "fruit"))[,c("fruit", "takeThisOne")]
# fruit takeThisOne
# 1 apple 2
# 2 banana NA
# 3 pear 3
# 4 pineapple NA
对于每个 fruit
,我们可以使用 grep
和 return fruitDT2
中匹配的第一个条目。
这是一种基本的 R 方法,但使用 data.table
语法,因为您已经有了一个语法。
library(data.table)
fruitDT[, TakeThisOne := sapply(fruit, function(x)
fruitDT2$takeThisOne[grep(x, fruitDT2$fruit2)[1]])]
fruitDT
# fruit no TakeThisOne
#1: apple 3 3
#2: banana 2 NA
#3: pear 1 2
#4: pineapple 4 NA
我有两个data.tables
fruit <- c("apple", "banana", "pear", "pineapple")
no <- sample(4L)
fruitDT <- data.table(fruit,no)
fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
takeThisOne <- sample(4L)
fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
fruit no
1: apple 3
2: banana 2
3: pear 1
4: pineapple 4
fruitDT2
fruit2 takeThisOne
1: apple is a fruit 3
2: orange is a color 4
3: pear is pear 2
4: pine is also a tree 1
如果 fruit2 中的任何值与 fruitDT 中 fruit 列中的任何值匹配(部分),我想提取 takeThisOne 列的值。
预期结果
apple 3
banana NULL
pear 2
pineapple NULL
我打算在 str_detect 上结合使用 lapply 和 for 循环,但想知道是否存在更好的方法?
使用我的样本数据(因为它是随机的),
set.seed(42)
fruit <- c("apple", "banana", "pear", "pineapple")
# no <- sample(4L)
# fruitDT <- data.table(fruit,no)
# fruit2 <- c("apple is a fruit", "orange is a color", "pear is pear", "pine is also a tree")
# takeThisOne <- sample(4L)
# fruitDT2 <- data.table(fruit2,takeThisOne)
fruitDT
# fruit no
# 1: apple 1
# 2: banana 4
# 3: pear 3
# 4: pineapple 2
fruitDT2
# fruit2 takeThisOne
# 1: apple is a fruit 2
# 2: orange is a color 4
# 3: pear is pear 3
# 4: pine is also a tree 1
我认为这是正确的:
fuzzyjoin::regex_right_join(fruitDT2, fruitDT, by = c("fruit2" = "fruit"))[,c("fruit", "takeThisOne")]
# fruit takeThisOne
# 1 apple 2
# 2 banana NA
# 3 pear 3
# 4 pineapple NA
对于每个 fruit
,我们可以使用 grep
和 return fruitDT2
中匹配的第一个条目。
这是一种基本的 R 方法,但使用 data.table
语法,因为您已经有了一个语法。
library(data.table)
fruitDT[, TakeThisOne := sapply(fruit, function(x)
fruitDT2$takeThisOne[grep(x, fruitDT2$fruit2)[1]])]
fruitDT
# fruit no TakeThisOne
#1: apple 3 3
#2: banana 2 NA
#3: pear 1 2
#4: pineapple 4 NA