检查项目是否存在于 R 的嵌套列表中

Question

我有一个 data.frame 看起来像这样（此处提供数据：https://github.com/JMcrocs/MEPVote/blob/master/MEP_ID_EPG.rds）

head(MEP_ID_EPG)
   mepid     EPG
1 197701 GUE.NGL
2 197533 GUE.NGL
3 197521 GUE.NGL

和一个包含 2336 个列表的大列表（数据：https://github.com/JMcrocs/MEPVote/blob/master/AllVotes.rds）

str(AllVotes, max.level = 7, list.len = 5)
List of 2336
 $ :List of 7
  ..$ votes  :List of 3
  .. ..$ +:List of 2
  .. .. ..$ total : num 83
  .. .. ..$ groups:List of 6
  .. .. .. ..$ GUE/NGL  :List of 23
  .. .. .. .. ..$ : Named num 197701
  .. .. .. .. .. ..- attr(*, "names")= chr "mepid"
  .. .. .. .. ..$ : Named num 197533
  .. .. .. .. .. ..- attr(*, "names")= chr "mepid"
  .. ..$ -:List of 2
  .. .. ..$ total : num 142
  .. .. ..$ groups:List of 8
  .. .. .. ..$ ECR      :List of 27
  .. .. .. .. ..$ : Named num 198096
  .. .. .. .. .. ..- attr(*, "names")= chr "mepid"
  .. .. .. .. ..$ : Named num 197467 
  .. ..$ 0:List of 2
  .. .. ..$ total : num 72
  .. .. ..$ groups:List of 4
  .. .. .. ..$ ID       :List of 3
  .. .. .. .. ..$ : Named num 197480
  .. .. .. .. .. ..- attr(*, "names")= chr "mepid"
  .. .. .. .. ..$ : Named num 197482

我的目标是在 MEP_ID_EPG 的 MEP (mepid) 行中添加一个“+”、一个“-”或一个“0”，如果他投赞成票（“+”），否（“-”）或其他（“0”），如果 NA 或他投了票（“0”）。在更复杂的语言中，它应该是这样的

如果( MEP_ID_EPG$mepid 在 AllVotes[[x]]$votes$'+'

的子列表中

然后 MEP_ID_EPG$[[x]] == '+')

if(MEP_ID_EPG$mepid 在 AllVotes[[x]]$votes$'-'

的子列表中

然后 MEP_ID_EPG$[[x]] == '-')

其他('0')

结果应该是这样的

head(MEP_ID_EPG)
   mepid     EPG      1     2  ... 2336
1 197701 GUE.NGL     +      +  ... +
2 197533 GUE.NGL     0     +   ... 0
3 197521 GUE.NGL      -     0  ... -

目前我只能这样做

MEP_ID_EPG$mepid %in% AllVotes[[1]]$votes$'+'$groups$`GUE/NGL`[[1]]

有人可以帮帮我吗？

提前致谢！

Answer 1

AllVotes 是一个包含 2336 个元素的未命名列表，每个元素都是一个特定的投票会话。所以我们需要循环其元素，例如 map()，或 lapply()，或 for 循环。

此外，对于给定的会话 i，投票 + 的 MEP 的 ID 可以通过以下方式获得：

unlist(AllVotes[[i]]$votes$`+`$groups)

- 和 0 相同。

由于您希望每次选举的输出都包含一列，因此让我们创建一个空白 table 并填充它。我将使用一个矩阵，我发现它在这里更实用。

meps2 <- matrix(NA_character_,
                nrow = nrow(meps),
                ncol=length(AllVotes),
                dimnames = list(meps$mepid,
                                as.character(1:length(AllVotes))))

for(i in 1:length(AllVotes)){
  meps2[as.character(unlist(AllVotes[[i]]$votes$`+`$groups)), i] <- "+"
  meps2[as.character(unlist(AllVotes[[i]]$votes$`-`$groups)), i] <- "-"
  meps2[as.character(unlist(AllVotes[[i]]$votes$`0`$groups)), i] <- "0"
}

在 R 中通常不推荐使用 For 循环，但在这里它工作得很好，我不确定 apply-family 函数会有任何好处。

看看：

meps2[1:10,1:10]
table(is.na(meps2))
# -> note there still are lots of NA.
# Possibly MEPs that were not present?

最后我们只需要 assemble 最后的 table。行顺序是一样的，所以我们甚至不需要 merge 或 match.

meps <- cbind(meps, meps2)

编辑：您使用 %in% 的想法可行，但效率不高。您需要在每个投票会话中循环，提取 3 个选民列表，然后对每个列表中的每个 MEP 应用 %in%（这本身就是一个循环）进行循环。那将是3个循环。在这里，通过反转问题，我们在投票会话上显式循环，并隐式地在每个列表 (-、+、0) 的 MEP 上循环。这只有 2 个循环，并且在矩阵中填充特定行非常有效。它可能看起来像（经过适当的初始化）：

for(vote in AllVotes){
  voters_+ <- unlist(AllVotes[[i]]$votes$`+`$groups)
  voters_- <- unlist(AllVotes[[i]]$votes$`-`$groups)
  voters_0 <- unlist(AllVotes[[i]]$votes$`0`$groups)
  
  meps[,vote] <-ifelse(meps$mepid %in% voters_+, "+",
          ifelse(meps$mepid %in% voters_-, "-",
          ifelse(meps$mepid %in% voters_0, "0", NA)))
}

检查项目是否存在于 R 的嵌套列表中

Check if an item exist in a nested list in R

r

function

list

nested-lists