填补家庭关系矩阵的缺失变量
Fill in missing variables of family relationship matrix
我有一个家庭关系数据框(parent、child、配偶等),按照以下示例部分填充。我正在尝试使用 R 来填充缺失的变量 <NA>
,但不确定从哪里开始。我试过使用 ifelse()
但代码变得如此笨拙,我相信一定有更有效的方法。
示例数据框
family person R01 R02 R03 R04 R05 R06
1 A 1 X Spouse Child Parent Parent Parent
2 A 2 <NA> X Child-in-law Parent Parent Parent
3 A 3 <NA> <NA> X GrandParent GrandParent GrandParent
4 A 4 <NA> <NA> <NA> X Sibling Sibling
5 A 5 <NA> <NA> <NA> <NA> X Sibling
6 A 6 <NA> <NA> <NA> <NA> <NA> X
7 B 1 X Spouse Parent Parent <NA> <NA>
8 B 2 <NA> X Parent Parent <NA> <NA>
9 B 3 <NA> <NA> X Sibling <NA> <NA>
10 B 4 <NA> <NA> <NA> X <NA> <NA>
11 C 1 X Parent <NA> <NA> <NA> <NA>
12 C 2 <NA> X <NA> <NA> <NA> <NA>
其中 R01 是 person x
到 person 1
的关系。对于上面数据框的第二行,我需要 R01
为 Spouse
,因为它与第一行中的 R02
匹配。这些关系将按照下面的 df 进行匹配。
关系匹配
[,1] [,2]
[1,] "Spouse" "Spouse"
[2,] "Parent" "Child"
[3,] "Child" "Parent"
[4,] "GrandParent" "GrandChild"
[5,] "GrandChild" "GrandParent"
[6,] "Parent-in-Law" "Child-in-law"
[7,] "Child-in-Law" "Parent-in-law"
复制示例的代码
df1 <- data.frame(family = c(rep("A", 6), rep("B", 4), rep("C",2)),
person = c(1:6, 1:4, 1:2),
R01 = c("X", rep(NA,5),"X", rep(NA,3),"X",NA),
R02 = c("Spouse", "X", rep(NA,4), "Spouse", "X", NA, NA, "Parent", "X"),
R03 = c("Child", "Child-in-law", "X", NA, NA, NA, "Parent", "Parent", "X", rep(NA,3)),
R04 = c(rep("Parent",2), "GrandParent", "X", NA, NA, rep("Parent",2), "Sibling", "X", NA, NA),
R05 = c(rep("Parent",2), "GrandParent", "Sibling", "X", rep(NA,7)),
R06 = c(rep("Parent",2), "GrandParent", rep("Sibling",2), "X", rep(NA,6)))
relationshipmatch <- matrix(c("Spouse", "Parent", "Child", "GrandParent", "GrandChild", "Parent-in-law", "Child-in-law", "Spouse", "Child", "Parent", "GrandChild", "GrandParent", "Child-in-law", "Parent-in-law"), ncol = 2)
此解决方案仅适用于 character
。由于您实际上有 numeric
(integer
?),您可能需要调整函数中的 [
-索引。
我假设帧总是按 row-wise person
和 column-wise 递增 R01:R06
排序。
invert_relationships <- function(mat) {
rel <- c(Spouse = "Spouse", Child = "Parent", Parent = "Child", GrandChild = "GrandParent",
GrandParent = "GrandChild", "Child-in-law" = "Parent-in-law",
"Parent-in-law" = "Child-in-law", Sibling = "Sibling", X = "X")
mat0 <- as.matrix(mat)[,seq_len(nrow(mat))]
mat0[] <- rel[match(as.matrix(mat0), names(rel))]
mat1 <- as.data.frame(mat)[,seq_len(nrow(mat0))]
mat1[lower.tri(mat1)] <- t(mat0)[lower.tri(mat0)]#mat0[upper.tri(mat0)]
cbind(mat1, mat[,-seq_len(nrow(mat0))])
}
df1 %>%
group_by(family) %>%
mutate(invert_relationships(select(cur_data(), -person))) %>%
ungroup()
# # A tibble: 12 x 8
# family person R01 R02 R03 R04 R05 R06
# <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 A 1 X Spouse Child Parent Parent Parent
# 2 A 2 Spouse X Child-in-law Parent Parent Parent
# 3 A 3 Parent Parent-in-law X GrandParent GrandParent GrandParent
# 4 A 4 Child Child GrandChild X Sibling Sibling
# 5 A 5 Child Child GrandChild Sibling X Sibling
# 6 A 6 Child Child GrandChild Sibling Sibling X
# 7 B 1 X Spouse Parent Parent NA NA
# 8 B 2 Spouse X Parent Parent NA NA
# 9 B 3 Child Child X Sibling NA NA
# 10 B 4 Child Child Sibling X NA NA
# 11 C 1 X Parent NA NA NA NA
# 12 C 2 Child X NA NA NA NA
你可以让关系矩阵在每个家庭中对称,同时在包含它们的关系中将 Child
和 Parent
交换。这里stringr::str_replace_all
是用来做交换的。
library(dplyr)
df1 %>%
group_by(family) %>%
group_modify(~ {
mat <- as.matrix(select(.x, starts_with("R") & !where(~all(is.na(.x)))))
mat[lower.tri(mat)] <- stringr::str_replace_all(
t(mat)[lower.tri(mat)],
c("Parent" = "Temp", "Child" = "Parent", "Temp" = "Child")
)
cbind(select(.x, !starts_with("R")), mat)
}) %>%
ungroup()
# A tibble: 12 × 8
family person R01 R02 R03 R04 R05 R06
<chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 A 1 X Spouse Child Parent Parent Parent
2 A 2 Spouse X Child-in-law Parent Parent Parent
3 A 3 Parent Parent-in-law X GrandParent GrandParent GrandParent
4 A 4 Child Child GrandChild X Sibling Sibling
5 A 5 Child Child GrandChild Sibling X Sibling
6 A 6 Child Child GrandChild Sibling Sibling X
7 B 1 X Spouse Parent Parent NA NA
8 B 2 Spouse X Parent Parent NA NA
9 B 3 Child Child X Sibling NA NA
10 B 4 Child Child Sibling X NA NA
11 C 1 X Parent NA NA NA NA
12 C 2 Child X NA NA NA NA
我有一个家庭关系数据框(parent、child、配偶等),按照以下示例部分填充。我正在尝试使用 R 来填充缺失的变量 <NA>
,但不确定从哪里开始。我试过使用 ifelse()
但代码变得如此笨拙,我相信一定有更有效的方法。
示例数据框
family person R01 R02 R03 R04 R05 R06
1 A 1 X Spouse Child Parent Parent Parent
2 A 2 <NA> X Child-in-law Parent Parent Parent
3 A 3 <NA> <NA> X GrandParent GrandParent GrandParent
4 A 4 <NA> <NA> <NA> X Sibling Sibling
5 A 5 <NA> <NA> <NA> <NA> X Sibling
6 A 6 <NA> <NA> <NA> <NA> <NA> X
7 B 1 X Spouse Parent Parent <NA> <NA>
8 B 2 <NA> X Parent Parent <NA> <NA>
9 B 3 <NA> <NA> X Sibling <NA> <NA>
10 B 4 <NA> <NA> <NA> X <NA> <NA>
11 C 1 X Parent <NA> <NA> <NA> <NA>
12 C 2 <NA> X <NA> <NA> <NA> <NA>
其中 R01 是 person x
到 person 1
的关系。对于上面数据框的第二行,我需要 R01
为 Spouse
,因为它与第一行中的 R02
匹配。这些关系将按照下面的 df 进行匹配。
关系匹配
[,1] [,2]
[1,] "Spouse" "Spouse"
[2,] "Parent" "Child"
[3,] "Child" "Parent"
[4,] "GrandParent" "GrandChild"
[5,] "GrandChild" "GrandParent"
[6,] "Parent-in-Law" "Child-in-law"
[7,] "Child-in-Law" "Parent-in-law"
复制示例的代码
df1 <- data.frame(family = c(rep("A", 6), rep("B", 4), rep("C",2)),
person = c(1:6, 1:4, 1:2),
R01 = c("X", rep(NA,5),"X", rep(NA,3),"X",NA),
R02 = c("Spouse", "X", rep(NA,4), "Spouse", "X", NA, NA, "Parent", "X"),
R03 = c("Child", "Child-in-law", "X", NA, NA, NA, "Parent", "Parent", "X", rep(NA,3)),
R04 = c(rep("Parent",2), "GrandParent", "X", NA, NA, rep("Parent",2), "Sibling", "X", NA, NA),
R05 = c(rep("Parent",2), "GrandParent", "Sibling", "X", rep(NA,7)),
R06 = c(rep("Parent",2), "GrandParent", rep("Sibling",2), "X", rep(NA,6)))
relationshipmatch <- matrix(c("Spouse", "Parent", "Child", "GrandParent", "GrandChild", "Parent-in-law", "Child-in-law", "Spouse", "Child", "Parent", "GrandChild", "GrandParent", "Child-in-law", "Parent-in-law"), ncol = 2)
此解决方案仅适用于 character
。由于您实际上有 numeric
(integer
?),您可能需要调整函数中的 [
-索引。
我假设帧总是按 row-wise person
和 column-wise 递增 R01:R06
排序。
invert_relationships <- function(mat) {
rel <- c(Spouse = "Spouse", Child = "Parent", Parent = "Child", GrandChild = "GrandParent",
GrandParent = "GrandChild", "Child-in-law" = "Parent-in-law",
"Parent-in-law" = "Child-in-law", Sibling = "Sibling", X = "X")
mat0 <- as.matrix(mat)[,seq_len(nrow(mat))]
mat0[] <- rel[match(as.matrix(mat0), names(rel))]
mat1 <- as.data.frame(mat)[,seq_len(nrow(mat0))]
mat1[lower.tri(mat1)] <- t(mat0)[lower.tri(mat0)]#mat0[upper.tri(mat0)]
cbind(mat1, mat[,-seq_len(nrow(mat0))])
}
df1 %>%
group_by(family) %>%
mutate(invert_relationships(select(cur_data(), -person))) %>%
ungroup()
# # A tibble: 12 x 8
# family person R01 R02 R03 R04 R05 R06
# <chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 A 1 X Spouse Child Parent Parent Parent
# 2 A 2 Spouse X Child-in-law Parent Parent Parent
# 3 A 3 Parent Parent-in-law X GrandParent GrandParent GrandParent
# 4 A 4 Child Child GrandChild X Sibling Sibling
# 5 A 5 Child Child GrandChild Sibling X Sibling
# 6 A 6 Child Child GrandChild Sibling Sibling X
# 7 B 1 X Spouse Parent Parent NA NA
# 8 B 2 Spouse X Parent Parent NA NA
# 9 B 3 Child Child X Sibling NA NA
# 10 B 4 Child Child Sibling X NA NA
# 11 C 1 X Parent NA NA NA NA
# 12 C 2 Child X NA NA NA NA
你可以让关系矩阵在每个家庭中对称,同时在包含它们的关系中将 Child
和 Parent
交换。这里stringr::str_replace_all
是用来做交换的。
library(dplyr)
df1 %>%
group_by(family) %>%
group_modify(~ {
mat <- as.matrix(select(.x, starts_with("R") & !where(~all(is.na(.x)))))
mat[lower.tri(mat)] <- stringr::str_replace_all(
t(mat)[lower.tri(mat)],
c("Parent" = "Temp", "Child" = "Parent", "Temp" = "Child")
)
cbind(select(.x, !starts_with("R")), mat)
}) %>%
ungroup()
# A tibble: 12 × 8
family person R01 R02 R03 R04 R05 R06
<chr> <int> <chr> <chr> <chr> <chr> <chr> <chr>
1 A 1 X Spouse Child Parent Parent Parent
2 A 2 Spouse X Child-in-law Parent Parent Parent
3 A 3 Parent Parent-in-law X GrandParent GrandParent GrandParent
4 A 4 Child Child GrandChild X Sibling Sibling
5 A 5 Child Child GrandChild Sibling X Sibling
6 A 6 Child Child GrandChild Sibling Sibling X
7 B 1 X Spouse Parent Parent NA NA
8 B 2 Spouse X Parent Parent NA NA
9 B 3 Child Child X Sibling NA NA
10 B 4 Child Child Sibling X NA NA
11 C 1 X Parent NA NA NA NA
12 C 2 Child X NA NA NA NA