R,找到模式并创建索引,然后根据创建的索引过滤另一列

R, find pattern and create index, then filter another column based on the created index

我有一个值 df df1 = data.frame(ID = c("A1; A2; A3; A4", "B1; B2; B3", "C1; C2","D1"), Value = c("1; 2; 3; 4", "5; 6; 7", "8; 9", "10"))

我还有一个df df2 = data.frame(ID = c("A2", "B3", "C2", "D1")).

现在我想写一个函数将df2中的'ID'映射到df1中的'ID',然后根据ID列的第i个位置过滤Value。预期输出为 filtered_df1 = data.frame(ID=c("A2", "B3", "C2", "D1"), value = c("2", "7", "9", "10")).

您能否就创建 R 函数提出任何建议?

非常感谢!

我们可以使用 tidyr 包中的 separate_rows 函数与 right_join 相结合来获取增值 type.convert(as.is = TRUE)

的数值
library(dplyr)
library(tidyr)

df1 %>% 
  separate_rows(c(ID, Value)) %>% 
  right_join(df2) %>%
  type.convert(as.is = TRUE)
  ID    Value
  <chr> <int>
1 A2        2
2 B3        7
3 C2        9
4 D1       10

这是一个在野外遇到的奇怪结构,不建议以这种方式存储 key-value 对。首先,修复 df1 以供人类使用。

library(tidyverse)
f <- function(x, y) {
  x %>%
    separate_rows(ID, Value, sep = "; ") %>%
    right_join(y, by = "ID") %>%
    mutate(Value = as.numeric(Value))
}

f(df1, df2)
  ID    Value
  <chr> <dbl>
1 A2        2
2 B3        7
3 C2        9
4 D1       10

data.table

df1 = data.frame(
  ID = c("A1; A2; A3; A4", "B1; B2; B3", "C1; C2", "D1"),
  Value = c("1; 2; 3; 4", "5; 6; 7", "8; 9", "10")
)

df2 = data.frame(ID = c("A2", "B3", "C2", "D1"))

filtered_df1 = data.frame(ID = c("A2", "B3", "C2", "D1"),
                          value = c("2", "7", "9", "10"))


library(data.table)
library(magrittr)

COLS <- c("ID", "Value")
setDT(df1)[, lapply(.SD, function(x) unlist(tstrsplit(x, split = "; ")))] %>%
  merge(y = df2, all.y = TRUE) %>%
  .[, lapply(.SD, type.convert, as.is = TRUE)]

#>    ID Value
#> 1: A2     2
#> 2: B3     7
#> 3: C2     9
#> 4: D1    10

reprex package (v2.0.1)

创建于 2022-03-17