使用条件处理列表中的数据帧以子集行,使用 R 处理另一个数据帧

dealing with dataframes in a list to subset rows using a condition and another dataframe using R

我有一个包含多个数据帧的列表,即 'mylist' 和一个数据帧,即 'mydf'。有了这两个,我需要解决两个问题来解决使用 R

实际列表包含许多数据框,实际数据框包含 10000 行。这里只显示示例数据

第一个问题: 我有一个包含多个数据框的列表。下面的列表是一个例子

mylist1 <- list(a = data.frame(ID = c("a_1", "b_1", "c_1", "d_1", "e_1", "f_1"), colb = c(3.67, 4.94, 8.11, 2.85, 9.53, 7.5), colc = c(3.45, 6.19, 4.96, 6.73, 9.26, 8.62)), 
       b = data.frame(cola = c("a_1", "b_1", "c_1", "d_1", "e_1", "f_1"), colb = c(5.24, 3.62, 0.29, 6.65, 7.86, 8.7), colc = c(7.03, 7.51, 0.842, 3.56, 8.68, 5.844)))

我想根据列 'colc' 中的值使用此处所说的条件对列表的每个数据框中的行进行子集化,如果列 'colc' 中的值 >= 6,我想对列表的每个数据框中的行进行子集

mylist1 的预期输出 1 如下...

mylistoutput <- list(a = data.frame(ID = c("b_1", "d_1", "e_1", "f_1"), colb = c(4.94, 2.85, 9.53, 7.5), colc = c(6.19, 6.73, 9.26, 8.62)), 
       b = data.frame(cola = c("a_1", "b_1", "e_1"), colb = c(5.24, 3.62, 7.86), colc = c(7.03, 7.51, 8.68)))

我尝试使用 filter/subset 的条件对行进行子集化,如下所示

mylistoutput <- lapply(mylist, function(x) filter(x$colc >= 6))

但是失败了......

第二个问题: 来自'mylistoutput',我想做两件事

首先,对于 'mylistoutput' 的第一个数据帧,我想将 'mylistoutput' 中“ID”列中的 ID 与 'mydf'[ 数据帧中的 ID 进行匹配=16=]

dataframe 'mydf'样本如下

mydf <- data.frame(ID = c("a_1","a_1","a_1","a_1","a_1", "b_1","b_1","b_1","b_1", "c_1","c_1","c_1", "d_1","d_1","d_1", "e_1","e_1","e_1","e_1","e_1", "f_1","f_1","f_1","g_1","g_1","g_1","g_1","g_1"), colb = c(3.67,1,2.3,2.5,5, 1.1,2.2,3.7,4.94, 8.11,1.23,2, 2.85,1,2, 5,4,9.53,4,5, 8,7,7.5, 1,2,3,4,5), colc = c(3.45,1,2,3,4, 6.19,1,2,3, 4.96,1,2, 6.73,1,2, 9.26,1,2,3,4, 8.62,1,2, 1,2,3,4,5))

现在,我想提取 'mylistoutput' 和 'mydf'

中第一个数据帧之间的所有匹配 ID

'mydf' 的预期输出如下

 mydfoutput1 <- data.frame(ID = c("b_1","b_1","b_1","b_1", "d_1","d_1","d_1", "e_1","e_1","e_1","e_1","e_1", "f_1","f_1","f_1"), colb = c(1.1,2.2,3.7,4.94, 2.85,1,2, 5,4,9.53,4,5, 8,7,7.5), colc = c(6.19,1,2,3, 6.73,1,2, 9.26,1,2,3,4, 8.62,1,2))

其次,我想 select 列表 'mylistoutput' 中各种数据帧之间的匹配 ID。例如,“b_1”和“e_1”是列表 'mylistoutput' 的两个数据帧中的公共 ID。然后,我想从数据帧 'mydf'

中提取相同的 ID,即“b_1”和“e_1”

预期输出如下

mydfoutput2 <- data.frame(ID = c("b_1","b_1","b_1","b_1", "e_1","e_1","e_1","e_1","e_1"), colb = c(1.1,2.2,3.7,4.94, 5,4,9.53,4,5), colc = c(6.19,1,2,3, 9.26,1,2,3,4, ))

寻找解决上述问题的代码

我们可以使用 lapplysubset

out <- lapply(mylist1, subset, subset = colc >=6)

对于第二种情况,我们可以做

subset(mydf, ID %in% out[[1]]$ID)

对于第三种情况,使用Reduceintersect

subset(mydf, ID %in% Reduce(intersect, lapply(out, `[[`, 1)))

filter 来自 dplyr,它需要 data.frame 作为输入而不是向量

lapply(mylist, function(x) filter(x, colc >= 6))