如何检查嵌套列表的所有元素是否是 R 中另一个列表的子集
How to check whether all elements of a nested list are a subset of another list in R
为此,我尝试了多种不同的方法,包括 this stack,但没有任何方法能正常工作。
我的数据框 "SiteVisits"(一小部分 dput 在底部)由列组成 Date
(class = 日期),TagID
(class = 数字)、SiteVisits
(字符列表)和 NumSites
(class = 数字)。每行列出了每个日期发现单个生物体 (TagID
) 的所有站点。
我想根据标签访问的网站来指定标签是 "inside"、"outside" 还是 "transiting" 一整天。如果它从不访问外部站点,它只能是 "inside",如果它从不访问内部站点,它只能是 "outside"
首先,
我想确定某个日期的 TagID 的所有站点是否都包含在此列表中:
inside <- list(c("Release","IC1", "IC2", "IC3","RGD1"))
如果为真SiteVisit$Location = "INSIDE"
ELSE 测试某个日期的 TagID 的所有站点是否包含在此列表中:
outside <- list(c("ORS1","WC1","WC2","WC3","RGU1","ORN1","ORN2","ORS3","GL1","CVP1","CLRS"))
如果为真SiteVisit$Location = "OUTSIDE"
其他 SiteVisit$Location = "TRANSITING"
我已经尝试了很多不同的 dplyr
和 base
版本来完成这个,但是 none 似乎是正确的。我认为这是因为我没有正确检查 SiteVisit$SiteVisits
的每个元素
我目前的尝试是:
SiteVisit <- SiteVisit %>%
mutate(Location = ifelse(all(SiteVisits[[]] %in% inside), "INSIDE",
ifelse(all(SiteVisits[[]] %in% outside),"OUTSIDE","TRANSITING")))
这会产生所有 "INSIDE"
和
SiteVisit <- SiteVisit %>%
mutate(Location = ifelse(all(SiteVisits[] %in% inside), "INSIDE",
ifelse(all(SiteVisits[] %in% outside),"OUTSIDE","TRANSITING")))
这会产生所有 "TRANSITING"
此外,尝试在 for 循环中执行此操作也不太奏效
for (i in 1: nrow(SiteVisit)) {SiteVisit$Inside <-
all(SiteVisit$SiteVisits[[i]] %in% inside)}
产生全部 FALSE 而
all(SiteVisit$SiteVisits[[2]] %in% inside)
是真的
这是我的数据框的一小部分 "SiteVisit" dput:
structure(list(Date = structure(c(15828, 15828, 15847, 15847,
15847, 15847, 15847, 15847, 15848, 15848, 15848, 15848, 15848,
15848, 15848, 15848, 15849, 15849, 15849, 15849, 15849, 15849,
15849, 15850, 15850, 15850, 15850, 15850, 15850, 15850, 15851,
15851, 15851, 15851, 15851, 15851, 15851, 15851, 15852, 15852,
15852, 15852, 15852, 15852, 15852, 15853, 15853, 15853, 15853,
15853, 15853, 15853, 15853, 15853, 15854, 15854, 15854, 15854,
15854, 15854, 15854, 15854, 15855, 15855, 15855, 15855, 15855,
15855, 15855, 15855, 15855, 15855, 15855, 15855, 15855, 15855,
15856, 15856, 15856, 15856, 15856, 15856, 15856, 15856, 15856,
15856, 15856, 15856, 15856, 15857, 15857, 15857, 15857, 15857,
15857, 15857, 15857, 15857, 15857, 15857), class = "Date"), TagID = c(5717.06,
6277.06, 5073.06, 5717.06, 11121.1, 11191.1, 11387.1, 11415.1,
5717.06, 6277.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11317.1,
11387.1, 11415.1, 5717.06, 6277.06, 11191.1, 11219.1, 11289.1,
11387.1, 11415.1, 5717.06, 6277.06, 9015.01, 9833.06, 11191.1,
11219.1, 11289.1, 11387.1, 11415.1, 5717.06, 6277.06, 9015.01,
11191.1, 11219.1, 11289.1, 11387.1, 11415.1, 5641.22, 5717.06,
6221.06, 6277.06, 7909.22, 9015.01, 9833.06, 11121.1, 11191.1,
11219.1, 11289.1, 11317.1, 11387.1, 11415.1, 5717.06, 6277.06,
6529.06, 8119.01, 8545.06, 9015.01, 9497.06, 9833.06, 11191.1,
11219.1, 11289.1, 11387.1, 11415.1, 5717.06, 6277.06, 6529.06,
9015.01, 9497.06, 9833.06, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1), SiteVisits = list("Release", "Release", c("IC2", "IC1",
"Release"), "IC3", "WC2", "RGD1", c("WC1", "WC3"), "WC3", "IC3",
"IC3", "WC2", "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3",
"WC2", "RGD1", c("IC2", "IC1"), "IC1", "WC1", "WC3", "IC3",
"WC2", "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "WC2",
"RGD1", "IC2", "IC1", "WC1", "WC1", "WC3", "IC3", "IC3",
"RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "IC3", c("IC3",
"Release"), c("IC3", "IC2", "IC1", "Release"), "RGD1", "IC2",
"IC1", "WC1", "WC3", "IC3", "IC3", c("IC3", "IC2"), "RGD1",
"IC2", "IC1", "WC1", "WC3", "Release", "IC3", "Release",
"IC3", c("RGD1", "Release"), c("IC3", "IC2"), c("IC3", "IC1"
), "WC2", "RGD1", "IC2", "IC1", "WC1", "WC1", "WC3", "IC3",
"IC3", c("RGD1", "Release"), c("RGD1", "Release"), "Release",
c("IC3", "IC2", "IC1"), "Release", c("IC3", "IC2", "IC1",
"RGD1"), "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "IC3",
"RGD1", c("IC3", "IC2", "IC1"), "RGD1", c("IC3", "IC1", "RGD1"
), "RGD1", "IC2", c("IC2", "IC1"), "WC1", "WC3"), NumSites = c(1L,
1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 4L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L,
3L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 1L,
2L, 1L, 1L)), row.names = c(NA, -100L), groups = structure(list(
Date = structure(c(15828, 15847, 15848, 15849, 15850, 15851,
15852, 15853, 15854, 15855, 15856, 15857), class = "Date"),
.rows = list(1:2, 3:8, 9:16, 17:23, 24:30, 31:38, 39:45,
46:54, 55:62, 63:76, 77:89, 90:100)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
一旦 inside
和 outside
存储为 array
而不是 list
,以下工作
inside <- c("Release", "IC1", "IC2", "IC3", "RGD1")
outside <- c("ORS1", "WC1", "WC2", "WC3", "RGU1", "ORN1", "ORN2", "ORS3", "GL1", "CVP1", "CLRS")
df1$Location <- lapply(df1$SiteVisits, function(x) ifelse(all(x %in% inside), "INSIDE", ifelse(all(x %in% outside), "OUTSIDE", "TRANSIT")))
想要一个大约快 1/100 的答案? (不是错字*,这比 manotheshark 的回答更糟糕,但它适用于您的数据结构)。
*这是一个错字! 1/100 不是 1/10
for (i in 1:nrow(SiteVisit)) {
SiteVisit_test$Location[i] <- if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(inside))) {
"INSIDE"
} else if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(outside))) {
"OUTSIDE"
} else {"TRANSITIONING"}
}
两种方法的基准:
microbenchmark(
for_statement = for (i in 1:nrow(SiteVisit)) {
SiteVisit_test$Location[i] <- if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(inside))) {
"INSIDE"
} else if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(outside))) {
"OUTSIDE"
} else {"TRANSITIONING"}
},
lapply_statemnt = lapply(SiteVisit$SiteVisits, function(x) ifelse(all(x %in% inside2), "INSIDE", ifelse(all(x %in% outside2), "OUTSIDE", "TRANSIT")))
)
Unit: microseconds
expr min lq mean median uq max neval
for_statement 28874.4 30082.0 32411.968 31008.3 33108.90 48878.1 100
lapply_statemnt 268.4 284.2 346.201 295.5 310.85 4114.9 100
我真的不明白为什么 lapply 方法在这里要快得多...可能是因为我正在取消列出循环中的每个 i。
为此,我尝试了多种不同的方法,包括 this stack,但没有任何方法能正常工作。
我的数据框 "SiteVisits"(一小部分 dput 在底部)由列组成 Date
(class = 日期),TagID
(class = 数字)、SiteVisits
(字符列表)和 NumSites
(class = 数字)。每行列出了每个日期发现单个生物体 (TagID
) 的所有站点。
我想根据标签访问的网站来指定标签是 "inside"、"outside" 还是 "transiting" 一整天。如果它从不访问外部站点,它只能是 "inside",如果它从不访问内部站点,它只能是 "outside"
首先, 我想确定某个日期的 TagID 的所有站点是否都包含在此列表中:
inside <- list(c("Release","IC1", "IC2", "IC3","RGD1"))
如果为真SiteVisit$Location = "INSIDE"
ELSE 测试某个日期的 TagID 的所有站点是否包含在此列表中:
outside <- list(c("ORS1","WC1","WC2","WC3","RGU1","ORN1","ORN2","ORS3","GL1","CVP1","CLRS"))
如果为真SiteVisit$Location = "OUTSIDE"
其他 SiteVisit$Location = "TRANSITING"
我已经尝试了很多不同的 dplyr
和 base
版本来完成这个,但是 none 似乎是正确的。我认为这是因为我没有正确检查 SiteVisit$SiteVisits
我目前的尝试是:
SiteVisit <- SiteVisit %>%
mutate(Location = ifelse(all(SiteVisits[[]] %in% inside), "INSIDE",
ifelse(all(SiteVisits[[]] %in% outside),"OUTSIDE","TRANSITING")))
这会产生所有 "INSIDE"
和
SiteVisit <- SiteVisit %>%
mutate(Location = ifelse(all(SiteVisits[] %in% inside), "INSIDE",
ifelse(all(SiteVisits[] %in% outside),"OUTSIDE","TRANSITING")))
这会产生所有 "TRANSITING"
此外,尝试在 for 循环中执行此操作也不太奏效
for (i in 1: nrow(SiteVisit)) {SiteVisit$Inside <-
all(SiteVisit$SiteVisits[[i]] %in% inside)}
产生全部 FALSE 而
all(SiteVisit$SiteVisits[[2]] %in% inside)
是真的
这是我的数据框的一小部分 "SiteVisit" dput:
structure(list(Date = structure(c(15828, 15828, 15847, 15847,
15847, 15847, 15847, 15847, 15848, 15848, 15848, 15848, 15848,
15848, 15848, 15848, 15849, 15849, 15849, 15849, 15849, 15849,
15849, 15850, 15850, 15850, 15850, 15850, 15850, 15850, 15851,
15851, 15851, 15851, 15851, 15851, 15851, 15851, 15852, 15852,
15852, 15852, 15852, 15852, 15852, 15853, 15853, 15853, 15853,
15853, 15853, 15853, 15853, 15853, 15854, 15854, 15854, 15854,
15854, 15854, 15854, 15854, 15855, 15855, 15855, 15855, 15855,
15855, 15855, 15855, 15855, 15855, 15855, 15855, 15855, 15855,
15856, 15856, 15856, 15856, 15856, 15856, 15856, 15856, 15856,
15856, 15856, 15856, 15856, 15857, 15857, 15857, 15857, 15857,
15857, 15857, 15857, 15857, 15857, 15857), class = "Date"), TagID = c(5717.06,
6277.06, 5073.06, 5717.06, 11121.1, 11191.1, 11387.1, 11415.1,
5717.06, 6277.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1, 5717.06, 11121.1, 11191.1, 11219.1, 11289.1, 11317.1,
11387.1, 11415.1, 5717.06, 6277.06, 11191.1, 11219.1, 11289.1,
11387.1, 11415.1, 5717.06, 6277.06, 9015.01, 9833.06, 11191.1,
11219.1, 11289.1, 11387.1, 11415.1, 5717.06, 6277.06, 9015.01,
11191.1, 11219.1, 11289.1, 11387.1, 11415.1, 5641.22, 5717.06,
6221.06, 6277.06, 7909.22, 9015.01, 9833.06, 11121.1, 11191.1,
11219.1, 11289.1, 11317.1, 11387.1, 11415.1, 5717.06, 6277.06,
6529.06, 8119.01, 8545.06, 9015.01, 9497.06, 9833.06, 11191.1,
11219.1, 11289.1, 11387.1, 11415.1, 5717.06, 6277.06, 6529.06,
9015.01, 9497.06, 9833.06, 11191.1, 11219.1, 11289.1, 11387.1,
11415.1), SiteVisits = list("Release", "Release", c("IC2", "IC1",
"Release"), "IC3", "WC2", "RGD1", c("WC1", "WC3"), "WC3", "IC3",
"IC3", "WC2", "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3",
"WC2", "RGD1", c("IC2", "IC1"), "IC1", "WC1", "WC3", "IC3",
"WC2", "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "WC2",
"RGD1", "IC2", "IC1", "WC1", "WC1", "WC3", "IC3", "IC3",
"RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "IC3", c("IC3",
"Release"), c("IC3", "IC2", "IC1", "Release"), "RGD1", "IC2",
"IC1", "WC1", "WC3", "IC3", "IC3", c("IC3", "IC2"), "RGD1",
"IC2", "IC1", "WC1", "WC3", "Release", "IC3", "Release",
"IC3", c("RGD1", "Release"), c("IC3", "IC2"), c("IC3", "IC1"
), "WC2", "RGD1", "IC2", "IC1", "WC1", "WC1", "WC3", "IC3",
"IC3", c("RGD1", "Release"), c("RGD1", "Release"), "Release",
c("IC3", "IC2", "IC1"), "Release", c("IC3", "IC2", "IC1",
"RGD1"), "RGD1", "IC2", "IC1", "WC1", "WC3", "IC3", "IC3",
"RGD1", c("IC3", "IC2", "IC1"), "RGD1", c("IC3", "IC1", "RGD1"
), "RGD1", "IC2", c("IC2", "IC1"), "WC1", "WC3"), NumSites = c(1L,
1L, 3L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 4L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L,
3L, 1L, 4L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 3L, 1L, 3L, 1L, 1L,
2L, 1L, 1L)), row.names = c(NA, -100L), groups = structure(list(
Date = structure(c(15828, 15847, 15848, 15849, 15850, 15851,
15852, 15853, 15854, 15855, 15856, 15857), class = "Date"),
.rows = list(1:2, 3:8, 9:16, 17:23, 24:30, 31:38, 39:45,
46:54, 55:62, 63:76, 77:89, 90:100)), row.names = c(NA,
-12L), class = c("tbl_df", "tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
一旦 inside
和 outside
存储为 array
而不是 list
inside <- c("Release", "IC1", "IC2", "IC3", "RGD1")
outside <- c("ORS1", "WC1", "WC2", "WC3", "RGU1", "ORN1", "ORN2", "ORS3", "GL1", "CVP1", "CLRS")
df1$Location <- lapply(df1$SiteVisits, function(x) ifelse(all(x %in% inside), "INSIDE", ifelse(all(x %in% outside), "OUTSIDE", "TRANSIT")))
想要一个大约快 1/100 的答案? (不是错字*,这比 manotheshark 的回答更糟糕,但它适用于您的数据结构)。 *这是一个错字! 1/100 不是 1/10
for (i in 1:nrow(SiteVisit)) {
SiteVisit_test$Location[i] <- if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(inside))) {
"INSIDE"
} else if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(outside))) {
"OUTSIDE"
} else {"TRANSITIONING"}
}
两种方法的基准:
microbenchmark(
for_statement = for (i in 1:nrow(SiteVisit)) {
SiteVisit_test$Location[i] <- if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(inside))) {
"INSIDE"
} else if (all(unlist(SiteVisit[i, ]$SiteVisits) %in% unlist(outside))) {
"OUTSIDE"
} else {"TRANSITIONING"}
},
lapply_statemnt = lapply(SiteVisit$SiteVisits, function(x) ifelse(all(x %in% inside2), "INSIDE", ifelse(all(x %in% outside2), "OUTSIDE", "TRANSIT")))
)
Unit: microseconds
expr min lq mean median uq max neval
for_statement 28874.4 30082.0 32411.968 31008.3 33108.90 48878.1 100
lapply_statemnt 268.4 284.2 346.201 295.5 310.85 4114.9 100
我真的不明白为什么 lapply 方法在这里要快得多...可能是因为我正在取消列出循环中的每个 i。