如何在整个 data.frame 中搜索字符串
How to search for a string in an entire data.frame
我有以下 table,其中我有汽车备件的项目名称。我有汽车制造商制造的特定零件的 ITEM 代码,我也有零件制造商制造的同一零件的相应 ITEM 代码。
我定期收到一个输入,其中我只收到已售出的商品代码。我如何确定哪个部分已售出。
> trial
# A tibble: 6 x 5
Name `OEM Part` `OES 1 Code` `OES 2 Code` `OES 3 Code`
<chr> <chr> <chr> <chr> <chr>
1 Brakes 231049A76 1910290/230023 NA NA
2 Cables 2410ASD12 NA 219930 3213Q23
3 Tyres 9412HJ12 231233 NA NA
4 Suspension 756634K71 782320/880716 NA NA
5 Ball Bearing 2IW2WD23 231224 NA NA
6 Clutches 9304JFW3 NA QQW223 23RQR3
假设我有以下值的输入
> item_code <- c("231049A76", "1910290", "1910290", "23RQR3")
我需要以下输出
Name
Brakes
Brakes
Brakes
Clutches
注意: 1910290
和 230023
是分开的部分;都是稍加改装的刹车
如果将数据重塑为长格式,可以使用连接:
library(tidyverse)
trial <- tibble(Name = c("Brakes", "Cables", "Tyres", "Suspension", "Ball Bearing", "Clutches"),
`OEM Part` = c("231049A76", "2410ASD12", "9412HJ12", "756634K71", "2IW2WD23", "9304JFW3"),
`OES 1 Code` = c("1910290/230023", NA, "231233", "782320/880716", "231224", NA),
`OES 2 Code` = c(NA, "219930", NA, NA, NA, "QQW223"),
`OES 3 Code` = c(NA, "3213Q23", NA, NA, NA, "23RQR3"))
trial_long <- trial %>%
gather('code_type', 'code', -Name) %>% # reshape to long form
separate_rows(code) %>% # separate double values
drop_na(code) # drop unnecessary NA rows
# join to filter and duplicate
trial_long %>%
right_join(tibble(code = c("231049A76", "1910290", "1910290", "23RQR3")))
#> # A tibble: 4 x 3
#> Name code_type code
#> <chr> <chr> <chr>
#> 1 Brakes OEM Part 231049A76
#> 2 Brakes OES 1 Code 1910290
#> 3 Brakes OES 1 Code 1910290
#> 4 Clutches OES 3 Code 23RQR3
这是一个类似于您使用 base
的示例:
## Create a dummy matrix
example <- cbind(matrix(1:4, 4,1), matrix(letters[1:20], 4, 4))
colnames(example) <- c("names", "W", "X", "Y", "Z")
# names W X Y Z
#[1,] "1" "a" "e" "i" "m"
#[2,] "2" "b" "f" "j" "n"
#[3,] "3" "c" "g" "k" "o"
#[4,] "4" "d" "h" "l" "p"
这 table 与您的相似,名称在第一列,匹配的模式在其他列。
## The pattern of interest
pattern <- c("a","e", "f", "p")
对于此模式,我们期望得到以下结果:"1","1","2","4"
。
## Detecting the pattern per row
matching_rows <- row(example[,-1])[example[,-1] %in% pattern]
#[1] 1 1 2 4
## Returning the rows with the pattern
example[matching_rows,1]
#[1] "1" "1" "2" "4"
使用 sapply
和 apply
的不太有效的方法,我们找出 trial
中的哪一行包含 item_code
,然后得到它对应的 Name
值。
sapply(item_code, function(x)
trial$Name[apply(trial[-1], 1, function(y) any(grepl(x, y)))])
# 231049A76 1910290 1910290 23RQR3
# "Brakes" "Brakes" "Brakes" "Clutches"
如果您不需要名称,请在 sapply
中设置 USE.NAMES = FALSE
。
我有以下 table,其中我有汽车备件的项目名称。我有汽车制造商制造的特定零件的 ITEM 代码,我也有零件制造商制造的同一零件的相应 ITEM 代码。
我定期收到一个输入,其中我只收到已售出的商品代码。我如何确定哪个部分已售出。
> trial
# A tibble: 6 x 5
Name `OEM Part` `OES 1 Code` `OES 2 Code` `OES 3 Code`
<chr> <chr> <chr> <chr> <chr>
1 Brakes 231049A76 1910290/230023 NA NA
2 Cables 2410ASD12 NA 219930 3213Q23
3 Tyres 9412HJ12 231233 NA NA
4 Suspension 756634K71 782320/880716 NA NA
5 Ball Bearing 2IW2WD23 231224 NA NA
6 Clutches 9304JFW3 NA QQW223 23RQR3
假设我有以下值的输入
> item_code <- c("231049A76", "1910290", "1910290", "23RQR3")
我需要以下输出
Name
Brakes
Brakes
Brakes
Clutches
注意: 1910290
和 230023
是分开的部分;都是稍加改装的刹车
如果将数据重塑为长格式,可以使用连接:
library(tidyverse)
trial <- tibble(Name = c("Brakes", "Cables", "Tyres", "Suspension", "Ball Bearing", "Clutches"),
`OEM Part` = c("231049A76", "2410ASD12", "9412HJ12", "756634K71", "2IW2WD23", "9304JFW3"),
`OES 1 Code` = c("1910290/230023", NA, "231233", "782320/880716", "231224", NA),
`OES 2 Code` = c(NA, "219930", NA, NA, NA, "QQW223"),
`OES 3 Code` = c(NA, "3213Q23", NA, NA, NA, "23RQR3"))
trial_long <- trial %>%
gather('code_type', 'code', -Name) %>% # reshape to long form
separate_rows(code) %>% # separate double values
drop_na(code) # drop unnecessary NA rows
# join to filter and duplicate
trial_long %>%
right_join(tibble(code = c("231049A76", "1910290", "1910290", "23RQR3")))
#> # A tibble: 4 x 3
#> Name code_type code
#> <chr> <chr> <chr>
#> 1 Brakes OEM Part 231049A76
#> 2 Brakes OES 1 Code 1910290
#> 3 Brakes OES 1 Code 1910290
#> 4 Clutches OES 3 Code 23RQR3
这是一个类似于您使用 base
的示例:
## Create a dummy matrix
example <- cbind(matrix(1:4, 4,1), matrix(letters[1:20], 4, 4))
colnames(example) <- c("names", "W", "X", "Y", "Z")
# names W X Y Z
#[1,] "1" "a" "e" "i" "m"
#[2,] "2" "b" "f" "j" "n"
#[3,] "3" "c" "g" "k" "o"
#[4,] "4" "d" "h" "l" "p"
这 table 与您的相似,名称在第一列,匹配的模式在其他列。
## The pattern of interest
pattern <- c("a","e", "f", "p")
对于此模式,我们期望得到以下结果:"1","1","2","4"
。
## Detecting the pattern per row
matching_rows <- row(example[,-1])[example[,-1] %in% pattern]
#[1] 1 1 2 4
## Returning the rows with the pattern
example[matching_rows,1]
#[1] "1" "1" "2" "4"
使用 sapply
和 apply
的不太有效的方法,我们找出 trial
中的哪一行包含 item_code
,然后得到它对应的 Name
值。
sapply(item_code, function(x)
trial$Name[apply(trial[-1], 1, function(y) any(grepl(x, y)))])
# 231049A76 1910290 1910290 23RQR3
# "Brakes" "Brakes" "Brakes" "Clutches"
如果您不需要名称,请在 sapply
中设置 USE.NAMES = FALSE
。