R 如何向量化过滤器 table 查找
R How to vectorize a filter table lookup
我正在寻找一种方法来矢量化 table 查找。这是我预期 input/output 不使用循环的示例。
library(dplyr)
library(tidyr)
input <- tibble(
value = c(2, 1.5, 3),
color = c('blue', 'green', NA)
)
table_lookup <- tibble(
min = c(0.5, 1.5, 2.5),
max = c(1.5, 2.5, 99999),
color = c('green', 'blue', NA),
output = c(8:10)
)
## Desired output By filtering "value" from the input between min (excluded min) and max and color match with color from "table_lookup".
c(9, 8, 10)
谢谢,
约翰
我们可以将 sapply
与 which
函数一起使用:
sapply(input$value,
FUN = function(x) table_lookup$output[which(x > table_lookup$min & x <= table_lookup$max)])
另一种选择是使用 merge
然后 filter
:
merge(input, table_lookup, by = NULL) %>%
filter(value > min, value <= max)
# value color.x min max color.y output
# 1 1.5 green 0.5 1.5 green 8
# 2 2.0 blue 1.5 2.5 blue 9
# 3 3.0 <NA> 2.5 99999.0 <NA> 10
您可以使用:
library(fuzzyjoin)
fuzzy_left_join(x = input,
y = table_lookup,
by = c("value" = "min", "value" = "max"),
match_fun = list(`>`, `<=`))
# A tibble: 3 x 6
value color.x min max color.y output
<dbl> <chr> <dbl> <dbl> <chr> <int>
1 2 blue 1.5 2.5 blue 9
2 1.5 green 0.5 1.5 green 8
3 3 NA 2.5 99999 NA 10
我正在寻找一种方法来矢量化 table 查找。这是我预期 input/output 不使用循环的示例。
library(dplyr)
library(tidyr)
input <- tibble(
value = c(2, 1.5, 3),
color = c('blue', 'green', NA)
)
table_lookup <- tibble(
min = c(0.5, 1.5, 2.5),
max = c(1.5, 2.5, 99999),
color = c('green', 'blue', NA),
output = c(8:10)
)
## Desired output By filtering "value" from the input between min (excluded min) and max and color match with color from "table_lookup".
c(9, 8, 10)
谢谢,
约翰
我们可以将 sapply
与 which
函数一起使用:
sapply(input$value,
FUN = function(x) table_lookup$output[which(x > table_lookup$min & x <= table_lookup$max)])
另一种选择是使用 merge
然后 filter
:
merge(input, table_lookup, by = NULL) %>%
filter(value > min, value <= max)
# value color.x min max color.y output
# 1 1.5 green 0.5 1.5 green 8
# 2 2.0 blue 1.5 2.5 blue 9
# 3 3.0 <NA> 2.5 99999.0 <NA> 10
您可以使用:
library(fuzzyjoin)
fuzzy_left_join(x = input,
y = table_lookup,
by = c("value" = "min", "value" = "max"),
match_fun = list(`>`, `<=`))
# A tibble: 3 x 6
value color.x min max color.y output
<dbl> <chr> <dbl> <dbl> <chr> <int>
1 2 blue 1.5 2.5 blue 9
2 1.5 green 0.5 1.5 green 8
3 3 NA 2.5 99999 NA 10