如何从列表列表中的列计算均值？

Question

我以列表的形式生成了 203 个单一数据 table 的列表。我使用 sapply 创建它，它看起来像这样：

在每个 table 中，我想 select slope_CC 值（它复制了它所以它的值相同几次）。我计算它们的平均值而不是 selecting 其中之一，因为它更容易。目标是在一个新数据 table 中拥有所有方法。 table 具有不同的行号。

First_table <- slope_list$selected0.csv

get_means_fun <- function(First_table){
  N <- First_table[["slope_CC"]]
  mean <- mean(N)
  return(data.frame(mean))
}

list_select <- lst(pattern=".csv", slope_list)
get_means <- lapply(list_select, get_means_fun)

lapply() 是最好的方法吗？我得到这个错误：Error in First_table[[[slope_CC"]] : subscript out of bounds 即使我运行同一行单独运行。

Answer 1

我已经制作了一个虚拟示例来说明我认为您正在寻找的内容。我的主要调整是不采用均值，这需要 R 对每个元素求和然后除以长度，我只是获取第一个值。

# example data
list_select <- list(table1 = head(mtcars),
                    table2 = FALSE,
                    table3 = tail(mtcars))

# grab the first value of column "wt". switch this for your column
# this checks to see if it is a data.frame, if not, return NA
# you can change this to return a NULL, or whatever you would like.
get_value_function <- function(the_table) {
  if (is.data.frame(the_table)) the_table[["wt"]][1] else NA
}

# returns a list
lapply(list_select, get_value_function)
# $table1
# [1] 2.62
# 
# $table2
# [1] NA
# 
# $table3
# [1] 2.14

# returns a vector
sapply(list_select, get_value_function)
# table1 table2 table3 
# 2.62     NA   2.14

我也喜欢在purrr做这样的事情。它省去了提前创建函数的麻烦。

library(purrr)

# returns a list like lapply
map(list_select, ~ if (is.data.frame(.x)) .x[["wt"]][1] else NA)

# returns a vector sapply
map_dbl(list_select, ~ if (is.data.frame(.x)) .x[["wt"]][1] else NA)

另一种方法是提前删除不是 data.frames 的列表元素。一个非常简单的方法，再次使用 purrr，是使用 keep.

list_select %>% 
  keep(is.data.frame) %>% 
  map_dbl(~ .x[["wt"]][1])

# table1 table3 
# 2.62   2.14

如何从列表列表中的列计算均值？

How to calculate means from a column in a list of lists?

r

function

lapply