一个列表的元素作为作用于另一个列表的函数的参数

Question

我有一个数据框列表，其中每个数据框都是相似的（具有相同名称的相同列）但包含有关不同的相关 "thing"（例如，花的种类）的信息。我需要一种优雅的方法来使用函数 cut() 将所有这些数据框中的其中一列从连续型重新分类为分类型。问题是每个 "thing" （花）都有不同的切点，并且会使用不同的标签。

我把切点和标签放在一个单独的列表中。如果我们按照我的假例子，它基本上是这样的：

iris <- iris 
peony <- iris  #pretending that this is actually different data!
flowers <- list(iris = iris, peony = peony)

params <- list(iris_param = list(cutpoints = c(1, 4.5),
                             labels = c("low", "medium", "high")),

           peony_param = list(cutpoints = c(1.5, 2.5, 5),
                              labels = c("too_low", "kinda_low", "okay", "just_right")))

#And we want to cut 'Sepal.Width' on both peony and iris

我现在真的卡住了。我试过使用 lapply() 和 do.call() 的一些组合，但我只是在猜测（而且猜错了）。

更笼统地说，我想知道：如何使用一组不断变化的参数将函数应用于列表中的不同数据框？

Answer 1

我认为这是 for 循环的好时机。写起来简单明了：

for (petal in seq_along(flowers)) {
    flowers[[petal]]$Sepal.Width.Cut = cut(
        x = flowers[[petal]]$Sepal.Width,
        breaks = c(-Inf, params[[petal]]$cutpoints, Inf),
        labels = params[[petal]]$labels
    )
}

请注意，(a) 我不得不增加你的休息时间以使 cut 对标签的长度感到满意，(b) 实际上我只是在迭代 1、2。更强大的版本可能遍历列表的 names 并且作为安全检查需要 params 列表具有相同的名称。由于您的列表名称不同，我只是使用索引。

这可能可以使用 mapply 来完成。我看不出这样做有什么好处 - 除非你已经对 mapply 感到满意，唯一真正的区别是 mapply 版本将花费你 10 倍的时间来编写。

Answer 2

我喜欢 Gregor 的解决方案，但我可能会改为堆叠数据：

library(data.table)

# rearrange parameters
params0 = setNames(params, c("iris", "peony"))
my_params = c(list(.id = names(params0)), do.call(Map, c(list, params0)))

# stack
DT = rbindlist(flowers, id = TRUE)

# merge and make cuts
DT[my_params, Sepal.Width.Cut := 
  cut(Sepal.Width, breaks = c(-Inf,cutpoints[[1]],Inf), labels = labels[[1]])
, on=".id", by=.EACHI]

（我借用了Gregor对切点的翻译。）结果是：

       .id Sepal.Length Sepal.Width Petal.Length Petal.Width   Species Sepal.Width.Cut
  1:  iris          5.1         3.5          1.4         0.2    setosa       kinda_low
  2:  iris          4.9         3.0          1.4         0.2    setosa       kinda_low
  3:  iris          4.7         3.2          1.3         0.2    setosa       kinda_low
  4:  iris          4.6         3.1          1.5         0.2    setosa       kinda_low
  5:  iris          5.0         3.6          1.4         0.2    setosa       kinda_low
 ---                                                                                  
296: peony          6.7         3.0          5.2         2.3 virginica            okay
297: peony          6.3         2.5          5.0         1.9 virginica       kinda_low
298: peony          6.5         3.0          5.2         2.0 virginica            okay
299: peony          6.2         3.4          5.4         2.3 virginica            okay
300: peony          5.9         3.0          5.1         1.8 virginica            okay

我认为堆叠数据通常比 data.frames 列表更有意义。您不需要使用 data.table 来堆叠或切割，但它的设计非常适合这些任务。

工作原理。

我猜rbindlist是清楚的。

代码

DT[my_params, on = ".id"]

进行合并。要了解这意味着什么，请查看：

as.data.table(my_params)
#      .id   cutpoints                            labels
# 1:  iris     1.0,4.5                   low,medium,high
# 2: peony 1.5,2.5,5.0 too_low,kinda_low,okay,just_right

因此，我们将 table 与 DT 通过它们的公共 .id 列合并。

当我们像
那样进行合并时
```
DT[my_params, j, on = ".id", by=.EACHI]
```
这意味着
- 进行合并，将 my_params 的每一行与 DT 的相关行相匹配。
- 对 my_params 的每一行执行 j，使用在两个 table 之一中找到的列。
j 在这种情况下是 column_for_DT := cut(...) 的形式，它在 DT.

一个列表的元素作为作用于另一个列表的函数的参数

Elements of one list as arguments to a function acting on another list

r

list

lapply