将列传递给 lapply 中的 "with"

Question

如何在 lapply 调用中将数据帧的列传递给 with 函数？

这些我都试过了，还是不行！

lapply(data[ , grepl( "Measured." , names( data ) ) ], with, (. <= 5 & . >= 1) | . == 4244)

lapply(data[ , grepl( "Measured." , names( data ) ) ], function(x) with((x <= 5 & x >= 1) | x == 4244))

我正在尝试查看 Measured. 列中的值是否介于 1 和 5 之间，此外 4244 是否也被接受。

示例数据集：

data <- structure(list(ID = 1:10, Date = c(2018L, 2018L, 2018L, 2015L, 
2018L, 2015L, 2015L, 2014L, 2014L, 2014L), Gender = structure(c(2L, 
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"), 
    Measured.1 = c(1L, 7L, 1L, 6L, 6L, 2L, 5L, 4L, 2L, 6L), Measured.2 = c(9L, 
    2L, 4L, 5L, 2L, 3L, 6L, 3L, 7L, 7L), Measured.3 = c(9L, 4L, 
    35L, 3L, 4L, 2L, 2L, 1L, 3L, 4L), Measured.4 = c(12L, 8L, 
    50L, 7L, 2L, 6L, 2L, 2L, 1L, 2L), Text = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L), .Label = c("N", "Y"), class = "factor"), 
    Test = c(5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L)), .Names = c("ID", 
"Date", "Gender", "Measured.1", "Measured.2", "Measured.3", "Measured.4", 
"Text", "Test"), class = "data.frame", row.names = c(NA, -10L
))

及其输出：

   ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1   1 2018      M          1          9          9         12    N    5
2   2 2018      M          7          2          4          8    N    5
3   3 2018      M          1          4         35         50    N    5
4   4 2015      M          6          5          3          7    N    5
5   5 2018      M          6          2          4          2    N    5
6   6 2015      M          2          3          2          6    Y    6
7   7 2015      F          5          6          2          2    Y    6
8   8 2014      F          4          3          1          2    Y    6
9   9 2014      F          2          7          3          1    N    6
10 10 2014      F          6          7          4          2    N    6

Answer 1

除了基础 R 之外，您还可以使用 dplyr 解决方案：

library(dplyr)
data %>%
  filter_at(vars(starts_with("Measured")), 
            any_vars((. >= 1 & . <= 5) | . == 4244))

这将查找 Measured 列中至少一个的值介于 1 和 5 或 4244 之间的记录。
如果你想限制并且所有值需要在这个范围内，你可以将它更改为：

data %>%
  filter_at(vars(starts_with("Measured")), 
            all_vars((. >= 1 & . <= 5) | . == 4244))

前者产生

   ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1   1 2018      M          1          9          9         12    N    5
2   2 2018      M          7          2          4          8    N    5
3   3 2018      M          1          4         35         50    N    5
4   4 2015      M          6          5          3          7    N    5
5   5 2018      M          6          2          4          2    N    5
6   6 2015      M          2          3          2          6    Y    6
7   7 2015      F          5          6          2          2    Y    6
8   8 2014      F          4          3          1          2    Y    6
9   9 2014      F          2          7          3          1    N    6
10 10 2014      F          6          7          4          2    N    6

而后者产生

  ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1  8 2014      F          4          3          1          2    Y    6

使用 base R 可以以更快（但 imo 可读性较差）的方式完成相同的操作：您可以使用带有掩码和 apply:

的基础 R 方法

# set up the cols of interest
colmask <- grepl("^Measured", names(data))

# apply the function rowwise (=1)
rowmask <- apply(data[colmask], 1, function(col) {
  any(((col >= 1 & col <= 5) | col == 4244))
})
data[rowmask,]

或

colmask <- grepl("^Measured", names(data))
rowmask <- apply(data[colmask], 1, function(col) {
  all(((col >= 1 & col <= 5) | col == 4244))
})
data[rowmask,]

显然这会产生相同的结果。

Answer 2

使用 base R，您可以将符合这些条件的行提取为：

data[data[,1][data[,4] >= 1 & data[,4] <= 5 & data[,5] >= 1 & data[,5] <= 5 & data[,6] >= 1 & data[,6] <= 5 & data[,7] >= 1 & data[,7] <= 5 | data[,4] == 4244 | data[,5] == 4244 | data[,6] == 4244 | data[,7] == 4244],]

我正在使用 & 创建附加条件（您正在寻找 measured.1、measured.2、measured.3 和 measured.4 都是 >= 1 和<= 5) 和 | 创建替代标准（任何测量值都是 4424）：

给予：

  ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
8  8 2014      F          4          3          1          2    Y    6

这不是最漂亮的代码片段，但是（根据微基准测试）它比运行一月的 dplyr 方法快 43 倍

将列传递给 lapply 中的 "with"

Passing columns to "with" in lapply

r

with-statement

lapply