比较几个表并创建一个新表,使用 R 显示哪些变量匹配

Compare several tables and create a new one that shows which variables match using R

我是 R 的新手。为了练习,我正在尝试创建一个 table 来显示在比较大约 50 table 之后哪些变量匹配。如果列匹配,我希望在单元格中看到“是”。否则为“否”。我将不胜感激有关如何解决此问题的任何提示。

我的输入数据是这样的:

Tables Variables
tabla_1 A
tabla_1 Z
tabla_1 Y
tabla_1 V
tabla_1 B
tabla_2 H
tabla_2 B
tabla_2 A
tabla_2 U
tabla_3 U
tabla_3 S
tabla_3 M
tabla_4 U
tabla_4 A
tabla_4 B
tabla_4 V
tabla_4 Q
tabla_4 O
tabla_4 F

我想得到这个:

Variables tabla_1 tabla_2 tabla_3 tabla_4
A Yes Yes No Yes
Z Yes No No No
Y Yes No No No
V Yes No No Yes
B No Yes No Yes
H No Yes No No
U No Yes Yes Yes
S No Yes Yes No
M No No Yes No
Q No No No Yes
O No No No Yes
F No No No Yes

感谢您的帮助。

通过 distinct()pivor_wider()

df %>%
  distinct(Variables, Tables) %>%
  mutate(n = "Yes") %>%
  pivot_wider(names_from = Tables, values_from = n, values_fill = list(n = "No"))

   Variables tabla_1 tabla_2 tabla_3 tabla_4
   <chr>       <dbl>   <dbl>   <dbl>   <dbl>
 1 A               1       1       0       1
 2 Z               1       0       0       0
 3 Y               1       0       0       0
 4 V               1       0       0       1
 5 B               1       1       0       1
 6 H               0       1       0       0
 7 U               0       1       1       1
 8 S               0       0       1       0
 9 M               0       0       1       0
10 Q               0       0       0       1
11 O               0       0       0       1
12 F               0       0       0       1

我们可以创建一列'Yes'并使用pivot_wider。然后,在 values_fill 中指定 'No' 值(默认情况下,它将是 NA

library(dplyr)
library(tidyr)
df1 %>%
    mutate(new = 'Yes') %>%
    pivot_wider(names_from = Tables, values_from = new, values_fill = 'No')

-输出

# A tibble: 12 x 5
   Variables tabla_1 tabla_2 tabla_3 tabla_4
   <chr>     <chr>   <chr>   <chr>   <chr>  
 1 A         Yes     Yes     No      Yes    
 2 Z         Yes     No      No      No     
 3 Y         Yes     No      No      No     
 4 V         Yes     No      No      Yes    
 5 B         Yes     Yes     No      Yes    
 6 H         No      Yes     No      No     
 7 U         No      Yes     Yes     Yes    
 8 S         No      No      Yes     No     
 9 M         No      No      Yes     No     
10 Q         No      No      No      Yes    
11 O         No      No      No      Yes    
12 F         No      No      No      Yes    

数据

df1 <- structure(list(Tables = c("tabla_1", "tabla_1", "tabla_1", "tabla_1", 
"tabla_1", "tabla_2", "tabla_2", "tabla_2", "tabla_2", "tabla_3", 
"tabla_3", "tabla_3", "tabla_4", "tabla_4", "tabla_4", "tabla_4", 
"tabla_4", "tabla_4", "tabla_4"), Variables = c("A", "Z", "Y", 
"V", "B", "H", "B", "A", "U", "U", "S", "M", "U", "A", "B", "V", 
"Q", "O", "F")), class = "data.frame", row.names = c(NA, -19L
))

您可以使用 table,这将 return 1/0 值而不是 'Yes'/'No'。

table(rev(df))

#   Tables
#Variables tabla_1 tabla_2 tabla_3 tabla_4
#        A       1       1       0       1
#        B       1       1       0       1
#        F       0       0       0       1
#        H       0       1       0       0
#        M       0       0       1       0
#        O       0       0       0       1
#        Q       0       0       0       1
#        S       0       0       1       0
#        U       0       1       1       1
#        V       1       0       0       1
#        Y       1       0       0       0
#        Z       1       0       0       0

要获得 'Yes'/'No' 值,您可以执行 -

tab <- table(rev(df))
tab <- ifelse(tab == 1, 'Yes', 'No')