根据另一个数据集的值和标题创建一个新列

Question

假设我有一个原始数据集，其第一列中的值是字母表中的 a 到 d df1:

a x1
b x2
c x3
d x4
e x5

然后我有另一个数据集，其中包含多个列，但其条目引用了上述数据集中的列 df2

---------
A | B | C
---------
a   b   c
    d   e

我想使用 R 函数来使用 df2（上面的 a、b、c 和 d）中的唯一值，以便在 df1 数据集中创建一个新列引用 df2 中相应列的标题，即 df3

a x1 A
b x2 B
c x3 C
d x4 B
e x5 C

。 工作示例：

> # data frame with numbers and characters
> df1 = data.frame(unique_values=letters[1:5], other_col=paste(rep("x",5), 1:5, sep=""))
> print(df1)
  unique_values other_col
1             a        x1
2             b        x2
3             c        x3
4             d        x4
5             e        x5
> #  Create dataset that is then used to create new column
> df2 = data.frame(A = c("a",NA), B=c("b","d"), C=c("c","e") )
> df2
     A B C
1    a b c
2 <NA> d e

# Using df1 and columns referenging the df1 in df2 create df3
library(dplyr)
#df3?

Answer 1

将第二个数据重塑为'long'格式，然后进行连接

library(dplyr)
library(tidyr)
pivot_longer(df2, everything(), values_to = 'unique_values', 
    values_drop_na = TRUE) %>%
  left_join(df1)

-输出

# A tibble: 5 x 3
#  name  unique_values other_col
#  <chr> <chr>         <chr>    
#1 A     a             x1       
#2 B     b             x2       
#3 C     c             x3       
#4 B     d             x4       
#5 C     e             x5

Answer 2

使用 merge + stack

的基础 R 选项

merge(df1, setNames(na.omit(stack(df2)), c("unique_values", "names")))

给予

  unique_values other_col names
1             a        x1     A
2             b        x2     B
3             c        x3     C
4             d        x4     B
5             e        x5     C

Answer 3

data.table版本：

library(data.table)

merge(setDT(df1), melt(setDT(df2), measure.vars = names(df2)), 
      by.x = 'unique_values', by.y = 'value')

#   unique_values other_col variable
#1:             a        x1        A
#2:             b        x2        B
#3:             c        x3        C
#4:             d        x4        B
#5:             e        x5        C

根据另一个数据集的值和标题创建一个新列

Create a new column based on the the values and heading of another dataset

r

dataframe

dplyr

tidyr

data-wrangling