如何在整个 data.frame 中搜索字符串

Question

我有以下 table，其中我有汽车备件的项目名称。我有汽车制造商制造的特定零件的 ITEM 代码，我也有零件制造商制造的同一零件的相应 ITEM 代码。

我定期收到一个输入，其中我只收到已售出的商品代码。我如何确定哪个部分已售出。

> trial
# A tibble: 6 x 5
  Name         `OEM Part` `OES 1 Code`   `OES 2 Code` `OES 3 Code`
  <chr>        <chr>      <chr>          <chr>        <chr>       
1 Brakes       231049A76  1910290/230023 NA           NA          
2 Cables       2410ASD12  NA             219930       3213Q23     
3 Tyres        9412HJ12   231233         NA           NA          
4 Suspension   756634K71  782320/880716  NA           NA          
5 Ball Bearing 2IW2WD23   231224         NA           NA          
6 Clutches     9304JFW3   NA             QQW223       23RQR3

假设我有以下值的输入

> item_code <- c("231049A76", "1910290", "1910290", "23RQR3")

我需要以下输出

Name
Brakes
Brakes
Brakes
Clutches

注意： 1910290 和 230023 是分开的部分；都是稍加改装的刹车

Answer 1

如果将数据重塑为长格式，可以使用连接：

library(tidyverse)

trial <- tibble(Name = c("Brakes", "Cables", "Tyres", "Suspension", "Ball Bearing", "Clutches"), 
                `OEM Part` = c("231049A76", "2410ASD12", "9412HJ12", "756634K71", "2IW2WD23", "9304JFW3"), 
                `OES 1 Code` = c("1910290/230023", NA, "231233", "782320/880716", "231224", NA), 
                `OES 2 Code` = c(NA, "219930", NA, NA, NA, "QQW223"), 
                `OES 3 Code` = c(NA, "3213Q23", NA, NA, NA, "23RQR3"))

trial_long <- trial %>% 
    gather('code_type', 'code', -Name) %>%    # reshape to long form
    separate_rows(code) %>%    # separate double values
    drop_na(code)    # drop unnecessary NA rows

# join to filter and duplicate
trial_long %>% 
    right_join(tibble(code = c("231049A76", "1910290", "1910290", "23RQR3")))
#> # A tibble: 4 x 3
#>   Name     code_type  code     
#>   <chr>    <chr>      <chr>    
#> 1 Brakes   OEM Part   231049A76
#> 2 Brakes   OES 1 Code 1910290  
#> 3 Brakes   OES 1 Code 1910290  
#> 4 Clutches OES 3 Code 23RQR3

Answer 2

这是一个类似于您使用 base 的示例：

## Create a dummy matrix
example <- cbind(matrix(1:4, 4,1), matrix(letters[1:20], 4, 4))
colnames(example) <- c("names", "W", "X", "Y", "Z")
#     names W   X   Y   Z  
#[1,] "1"   "a" "e" "i" "m"
#[2,] "2"   "b" "f" "j" "n"
#[3,] "3"   "c" "g" "k" "o"
#[4,] "4"   "d" "h" "l" "p"

这 table 与您的相似，名称在第一列，匹配的模式在其他列。

## The pattern of interest
pattern <- c("a","e", "f", "p")

对于此模式，我们期望得到以下结果："1","1","2","4"。

## Detecting the pattern per row
matching_rows <- row(example[,-1])[example[,-1] %in% pattern]
#[1] 1 1 2 4

## Returning the rows with the pattern
example[matching_rows,1]
#[1] "1" "1" "2" "4"

Answer 3

使用 sapply 和 apply 的不太有效的方法，我们找出 trial 中的哪一行包含 item_code，然后得到它对应的 Name值。

sapply(item_code, function(x)   
            trial$Name[apply(trial[-1], 1,  function(y)  any(grepl(x, y)))])

# 231049A76    1910290    1910290     23RQR3 
#  "Brakes"   "Brakes"   "Brakes" "Clutches"

如果您不需要名称，请在 sapply 中设置 USE.NAMES = FALSE。

如何在整个 data.frame 中搜索字符串

How to search for a string in an entire data.frame

regex

r

dplyr

tidyr

tidyverse