获取列中具有 R 中的一个或两个字符串的行

Question

我有一列包含这样的项目列表

Fruit
Apple
Apple, Orange
Kiwi, Orange, Apple 
Kiwi

我想获取包含 (Apple, Orange) 的行。我不确定该怎么做，我已经尝试了 str_detect 和 filter，但是 none 到目前为止已经奏效了，如果有任何其他建议，我们将不胜感激。

Answer 1

这个有用吗：

library(dplyr)
library(stringr)
df %>% filter(str_detect(Fruit, 'Apple|Orange'))
# A tibble: 3 x 1
  Fruit              
  <chr>              
1 Apple              
2 Apple, Orange      
3 Kiwi, Orange, Apple

使用的数据：

df
# A tibble: 4 x 1
  Fruit              
  <chr>              
1 Apple              
2 Apple, Orange      
3 Kiwi, Orange, Apple
4 Kiwi

Answer 2

就我个人而言，我喜欢使用 grepl() 来解决这类问题。您可以将正则表达式玩到 select 行。（参见示例 here）

df <- data.frame(list("fruits" = c("Apple", "Apple, Orange", "Kiwi, Apple", "Kiwi")))

df 的可视化：

| id | fruits        | 
|----|---------------|
| 1  | Apple         | 
| 2  | Apple, Orange |
| 3  | Kiwi, Apple   |
| 3  | Kiwi          |

那你可以这样写：

df_only_apples <- df[grepl("[Aa]pple", df$fruits),, drop=FALSE]

这会给你

| id | fruits        | 
|----|---------------|
| 1  | Apple         | 
| 2  | Apple, Orange |
| 3  | Kiwi, Apple   |

但是如果你想要 select 包含“Apples”和“Oranges”的行，你可以只写 df[grepl("([Aa]pple|[Oo]range)", df$fruits)

Answer 3

我们也可以拆分列，使用%in%

library(dplyr)
library(tidyr)
df %>% 
    mutate(rn = row_number()) %>% 
    separate_rows(fruits) %>%
    group_by(rn) %>% 
    filter(any(c('Apple', 'Orange') %in% fruits)) %>% 
    summarise(fruits = toString(fruits), .groups = 'drop') %>% 
    select(-rn)

获取列中具有 R 中的一个或两个字符串的行

get rows of where the column has either one or both of the strings in R

string

row

r

dataframe