比较几个表并创建一个新表,使用 R 显示哪些变量匹配
Compare several tables and create a new one that shows which variables match using R
我是 R 的新手。为了练习,我正在尝试创建一个 table 来显示在比较大约 50 table 之后哪些变量匹配。如果列匹配,我希望在单元格中看到“是”。否则为“否”。我将不胜感激有关如何解决此问题的任何提示。
我的输入数据是这样的:
Tables
Variables
tabla_1
A
tabla_1
Z
tabla_1
Y
tabla_1
V
tabla_1
B
tabla_2
H
tabla_2
B
tabla_2
A
tabla_2
U
tabla_3
U
tabla_3
S
tabla_3
M
tabla_4
U
tabla_4
A
tabla_4
B
tabla_4
V
tabla_4
Q
tabla_4
O
tabla_4
F
我想得到这个:
Variables
tabla_1
tabla_2
tabla_3
tabla_4
A
Yes
Yes
No
Yes
Z
Yes
No
No
No
Y
Yes
No
No
No
V
Yes
No
No
Yes
B
No
Yes
No
Yes
H
No
Yes
No
No
U
No
Yes
Yes
Yes
S
No
Yes
Yes
No
M
No
No
Yes
No
Q
No
No
No
Yes
O
No
No
No
Yes
F
No
No
No
Yes
感谢您的帮助。
通过 distinct()
和 pivor_wider()
df %>%
distinct(Variables, Tables) %>%
mutate(n = "Yes") %>%
pivot_wider(names_from = Tables, values_from = n, values_fill = list(n = "No"))
Variables tabla_1 tabla_2 tabla_3 tabla_4
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 1 1 0 1
2 Z 1 0 0 0
3 Y 1 0 0 0
4 V 1 0 0 1
5 B 1 1 0 1
6 H 0 1 0 0
7 U 0 1 1 1
8 S 0 0 1 0
9 M 0 0 1 0
10 Q 0 0 0 1
11 O 0 0 0 1
12 F 0 0 0 1
我们可以创建一列'Yes'并使用pivot_wider
。然后,在 values_fill
中指定 'No' 值(默认情况下,它将是 NA
)
library(dplyr)
library(tidyr)
df1 %>%
mutate(new = 'Yes') %>%
pivot_wider(names_from = Tables, values_from = new, values_fill = 'No')
-输出
# A tibble: 12 x 5
Variables tabla_1 tabla_2 tabla_3 tabla_4
<chr> <chr> <chr> <chr> <chr>
1 A Yes Yes No Yes
2 Z Yes No No No
3 Y Yes No No No
4 V Yes No No Yes
5 B Yes Yes No Yes
6 H No Yes No No
7 U No Yes Yes Yes
8 S No No Yes No
9 M No No Yes No
10 Q No No No Yes
11 O No No No Yes
12 F No No No Yes
数据
df1 <- structure(list(Tables = c("tabla_1", "tabla_1", "tabla_1", "tabla_1",
"tabla_1", "tabla_2", "tabla_2", "tabla_2", "tabla_2", "tabla_3",
"tabla_3", "tabla_3", "tabla_4", "tabla_4", "tabla_4", "tabla_4",
"tabla_4", "tabla_4", "tabla_4"), Variables = c("A", "Z", "Y",
"V", "B", "H", "B", "A", "U", "U", "S", "M", "U", "A", "B", "V",
"Q", "O", "F")), class = "data.frame", row.names = c(NA, -19L
))
您可以使用 table
,这将 return 1/0 值而不是 'Yes'/'No'。
table(rev(df))
# Tables
#Variables tabla_1 tabla_2 tabla_3 tabla_4
# A 1 1 0 1
# B 1 1 0 1
# F 0 0 0 1
# H 0 1 0 0
# M 0 0 1 0
# O 0 0 0 1
# Q 0 0 0 1
# S 0 0 1 0
# U 0 1 1 1
# V 1 0 0 1
# Y 1 0 0 0
# Z 1 0 0 0
要获得 'Yes'/'No' 值,您可以执行 -
tab <- table(rev(df))
tab <- ifelse(tab == 1, 'Yes', 'No')
我是 R 的新手。为了练习,我正在尝试创建一个 table 来显示在比较大约 50 table 之后哪些变量匹配。如果列匹配,我希望在单元格中看到“是”。否则为“否”。我将不胜感激有关如何解决此问题的任何提示。
我的输入数据是这样的:
Tables | Variables |
---|---|
tabla_1 | A |
tabla_1 | Z |
tabla_1 | Y |
tabla_1 | V |
tabla_1 | B |
tabla_2 | H |
tabla_2 | B |
tabla_2 | A |
tabla_2 | U |
tabla_3 | U |
tabla_3 | S |
tabla_3 | M |
tabla_4 | U |
tabla_4 | A |
tabla_4 | B |
tabla_4 | V |
tabla_4 | Q |
tabla_4 | O |
tabla_4 | F |
我想得到这个:
Variables | tabla_1 | tabla_2 | tabla_3 | tabla_4 |
---|---|---|---|---|
A | Yes | Yes | No | Yes |
Z | Yes | No | No | No |
Y | Yes | No | No | No |
V | Yes | No | No | Yes |
B | No | Yes | No | Yes |
H | No | Yes | No | No |
U | No | Yes | Yes | Yes |
S | No | Yes | Yes | No |
M | No | No | Yes | No |
Q | No | No | No | Yes |
O | No | No | No | Yes |
F | No | No | No | Yes |
感谢您的帮助。
通过 distinct()
和 pivor_wider()
df %>%
distinct(Variables, Tables) %>%
mutate(n = "Yes") %>%
pivot_wider(names_from = Tables, values_from = n, values_fill = list(n = "No"))
Variables tabla_1 tabla_2 tabla_3 tabla_4
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 1 1 0 1
2 Z 1 0 0 0
3 Y 1 0 0 0
4 V 1 0 0 1
5 B 1 1 0 1
6 H 0 1 0 0
7 U 0 1 1 1
8 S 0 0 1 0
9 M 0 0 1 0
10 Q 0 0 0 1
11 O 0 0 0 1
12 F 0 0 0 1
我们可以创建一列'Yes'并使用pivot_wider
。然后,在 values_fill
中指定 'No' 值(默认情况下,它将是 NA
)
library(dplyr)
library(tidyr)
df1 %>%
mutate(new = 'Yes') %>%
pivot_wider(names_from = Tables, values_from = new, values_fill = 'No')
-输出
# A tibble: 12 x 5
Variables tabla_1 tabla_2 tabla_3 tabla_4
<chr> <chr> <chr> <chr> <chr>
1 A Yes Yes No Yes
2 Z Yes No No No
3 Y Yes No No No
4 V Yes No No Yes
5 B Yes Yes No Yes
6 H No Yes No No
7 U No Yes Yes Yes
8 S No No Yes No
9 M No No Yes No
10 Q No No No Yes
11 O No No No Yes
12 F No No No Yes
数据
df1 <- structure(list(Tables = c("tabla_1", "tabla_1", "tabla_1", "tabla_1",
"tabla_1", "tabla_2", "tabla_2", "tabla_2", "tabla_2", "tabla_3",
"tabla_3", "tabla_3", "tabla_4", "tabla_4", "tabla_4", "tabla_4",
"tabla_4", "tabla_4", "tabla_4"), Variables = c("A", "Z", "Y",
"V", "B", "H", "B", "A", "U", "U", "S", "M", "U", "A", "B", "V",
"Q", "O", "F")), class = "data.frame", row.names = c(NA, -19L
))
您可以使用 table
,这将 return 1/0 值而不是 'Yes'/'No'。
table(rev(df))
# Tables
#Variables tabla_1 tabla_2 tabla_3 tabla_4
# A 1 1 0 1
# B 1 1 0 1
# F 0 0 0 1
# H 0 1 0 0
# M 0 0 1 0
# O 0 0 0 1
# Q 0 0 0 1
# S 0 0 1 0
# U 0 1 1 1
# V 1 0 0 1
# Y 1 0 0 0
# Z 1 0 0 0
要获得 'Yes'/'No' 值,您可以执行 -
tab <- table(rev(df))
tab <- ifelse(tab == 1, 'Yes', 'No')