用 R 中两个数据框中的某些列填充列

Question

我有两个数据帧（例如：df1、df2）。我想填写从 df1 到 df2 的一列（测试），条件是两个数据帧之间具有相同的 ID 和名称。我试过使用 for 循环，但发现很难实现。结果应如以下形式：“输出”。我将不胜感激你的帮助。谢谢！

Code:
var_of_interest <- c("testing")
df2[var_of_interest] <- lapply(var_of_interest, function(x), df1[[x]][match(df2$ID, df1$ID) & match(df2$Name, df1$Name)])

df1: df1 is a subset of df2(It has information on testing)
ID Name testing
1  a    100
2  a    90
3  a    80
4  a    70
5  a    60

df2:
ID Name testing
1  a    NA
2  a    NA
2  b    400
3  a    NA
3  c    300
4  a    NA
4  d    200
5  a    NA
5  e    150


output:
ID Name testing
1  a    100
2  a    90
2  b    400
3  a    80
3  c    300
4  a    70
4  d    200
5  a    60
5  e    150

Answer 1

两个可能适合您的选项

基础 R 方法

output <- merge(df1, df2, by = c('ID', 'Name'), all = TRUE)
output$testing <- ifelse(is.na(output$testing.x), output$testing.y, output$testing.x)
output <- subset(output, select = c(ID, Name, testing))

或dplyr方法

library(dplyr)

df2 %>% 
  left_join(df1, by = c('ID', 'Name')) %>%
  mutate(testing = coalesce(testing.x, testing.y)) %>%
  select(ID, Name, testing)

编辑：根据评论，处理更多变量的不同方法。这种方法利用了 dplyr 和 tidyr。基本思想是将数据格式从“宽”（29 列）更改为“长”（ID、名称、var、value_df1 和 value_df2），这样更容易替换值.

我没有在更大的假数据集上进行测试，但我认为这种方法应该适用于任意数量的变量。

library(dplyr)
library(tidyr)

df2 %>% 
  left_join(df1, by = c('ID', 'Name')) %>%
  pivot_longer(-c('ID', 'Name'),
               names_sep = '\.',
               names_to = c('var', 'source')) %>%
  pivot_wider(names_from = source, values_from = value) %>%
  group_by(ID, Name, var) %>%
  mutate(value = coalesce(x, y)) %>%
  select(-x, -y) %>%
  pivot_wider(names_from = var, values_from = value)

用 R 中两个数据框中的某些列填充列

Filling in columns with certain columns from two dataframes in R

r

lapply