在 R 中操作嵌套列表

Question

我的数据是列表结构列表中的基因，像这样：

>listoflists <- list(samp1 = c("ENSG00000000003", "ENSG00000000005", "ENSG00000000419", "ENSG00000000457"),
              samp2 = c("ENSG00000002834", "ENSG00000002919", "ENSG00000002933"),
              samp3 = c("ENSG00000000971", "ENSG00000001036", "ENSG00000001084", "ENSG00000001167"))

我正在尝试转换基因标识符。在数据帧结构中处理类似数据时，我成功地使用了如下代码：

>library(org.Hs.eg.db)
>gene_df$symbol <- mapIds(org.Hs.eg.db,keys=rownames(gene_df),column="SYMBOL",keytype="ENSEMBL",multiVals="first")

但现在我正在处理一个列表列表。我想保持相同的结构，我认为提供的答案应该让我有所了解，但是当我尝试使用这样的嵌套应用命令时：

>convertedLoL <- lapply(listoflists, function(x) lapply(listoflists[x], function(i)mapIds(org.Hs.eg.db,keys=listoflists[i],column="SYMBOL",keytype="ENSEMBL",multiVals="first")))
 Error in listoflists[[i]] : 
  attempt to select less than one element in get1index 

>convertedLoL <- lapply(listoflists, function(x) lapply(listoflists[x], function(i)mapIds(org.Hs.eg.db,keys=listoflists[[x]][[i]],column="SYMBOL",keytype="ENSEMBL",multiVals="first")))
 Error in listoflists[[x]] : no such index at level 1

我不断收到错误。我认为我的问题源于这样一个事实，即我没有完全理解 apply 的工作原理以及如何引用列表。有人可以帮助我吗？

编辑

我以为我已经弄明白了，但还是不太对。

>convertedLoL <- lapply(listoflists, function(x) sapply(x, function(i)mapIds(org.Hs.eg.db,keys=i,column="SYMBOL",keytype="ENSEMBL",multiVals="first")))

会给我什么可能是一个列表的列表列表。它也真的很慢。所以我仍然需要帮助...

Answer 1

您在示例中显示了一个向量列表。你可以简单地做：

lapply(listoflists, function(x) mapIDs(org.Hs.eg.db, keys=x, column="SYMBOL", keytype="ENSEMBL", multiVals="first")))

关于速度，对于许多列表（或向量和可能重叠的元素），您最好将所有（使用的）ID 一次映射到 SYMBOL，然后对该数据进行查找。frame/data.table/named向量。

# get all ids used in the lists as named vector
geneids <- unique(Reduce(c, listoflists))
key.table <- select(org.Hs.eg.db, keys = geneids, columns = c("SYMBOL","ENSEMBL"),
    keytype = "ENSEMBL")
keys <- setNames(key.table$SYMBOL, key.table$ENSEMBL)

convertedLoL <- lapply(listoflists, function(x) keys[x])

在 R 中操作嵌套列表

Manipulating nested lists in R

r

nested-lists

lapply