将列传递给 lapply 中的 "with"
Passing columns to "with" in lapply
如何在 lapply
调用中将数据帧的列传递给 with
函数?
这些我都试过了,还是不行!
lapply(data[ , grepl( "Measured." , names( data ) ) ], with, (. <= 5 & . >= 1) | . == 4244)
lapply(data[ , grepl( "Measured." , names( data ) ) ], function(x) with((x <= 5 & x >= 1) | x == 4244))
我正在尝试查看 Measured.
列中的值是否介于 1
和 5
之间,此外 4244
是否也被接受。
示例数据集:
data <- structure(list(ID = 1:10, Date = c(2018L, 2018L, 2018L, 2015L,
2018L, 2015L, 2015L, 2014L, 2014L, 2014L), Gender = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"),
Measured.1 = c(1L, 7L, 1L, 6L, 6L, 2L, 5L, 4L, 2L, 6L), Measured.2 = c(9L,
2L, 4L, 5L, 2L, 3L, 6L, 3L, 7L, 7L), Measured.3 = c(9L, 4L,
35L, 3L, 4L, 2L, 2L, 1L, 3L, 4L), Measured.4 = c(12L, 8L,
50L, 7L, 2L, 6L, 2L, 2L, 1L, 2L), Text = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L), .Label = c("N", "Y"), class = "factor"),
Test = c(5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L)), .Names = c("ID",
"Date", "Gender", "Measured.1", "Measured.2", "Measured.3", "Measured.4",
"Text", "Test"), class = "data.frame", row.names = c(NA, -10L
))
及其输出:
ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1 1 2018 M 1 9 9 12 N 5
2 2 2018 M 7 2 4 8 N 5
3 3 2018 M 1 4 35 50 N 5
4 4 2015 M 6 5 3 7 N 5
5 5 2018 M 6 2 4 2 N 5
6 6 2015 M 2 3 2 6 Y 6
7 7 2015 F 5 6 2 2 Y 6
8 8 2014 F 4 3 1 2 Y 6
9 9 2014 F 2 7 3 1 N 6
10 10 2014 F 6 7 4 2 N 6
除了基础 R
之外,您还可以使用 dplyr
解决方案:
library(dplyr)
data %>%
filter_at(vars(starts_with("Measured")),
any_vars((. >= 1 & . <= 5) | . == 4244))
这将查找 Measured
列中至少 一个 的值介于 1 和 5 或 4244 之间的记录。
如果你想限制并且所有值需要在这个范围内,你可以将它更改为:
data %>%
filter_at(vars(starts_with("Measured")),
all_vars((. >= 1 & . <= 5) | . == 4244))
前者产生
ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1 1 2018 M 1 9 9 12 N 5
2 2 2018 M 7 2 4 8 N 5
3 3 2018 M 1 4 35 50 N 5
4 4 2015 M 6 5 3 7 N 5
5 5 2018 M 6 2 4 2 N 5
6 6 2015 M 2 3 2 6 Y 6
7 7 2015 F 5 6 2 2 Y 6
8 8 2014 F 4 3 1 2 Y 6
9 9 2014 F 2 7 3 1 N 6
10 10 2014 F 6 7 4 2 N 6
而后者产生
ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1 8 2014 F 4 3 1 2 Y 6
使用 base R
可以以更快(但 imo 可读性较差)的方式完成相同的操作:
您可以使用带有掩码和 apply
: 的基础 R
方法
# set up the cols of interest
colmask <- grepl("^Measured", names(data))
# apply the function rowwise (=1)
rowmask <- apply(data[colmask], 1, function(col) {
any(((col >= 1 & col <= 5) | col == 4244))
})
data[rowmask,]
或
colmask <- grepl("^Measured", names(data))
rowmask <- apply(data[colmask], 1, function(col) {
all(((col >= 1 & col <= 5) | col == 4244))
})
data[rowmask,]
显然这会产生相同的结果。
使用 base R,您可以将符合这些条件的行提取为:
data[data[,1][data[,4] >= 1 & data[,4] <= 5 & data[,5] >= 1 & data[,5] <= 5 & data[,6] >= 1 & data[,6] <= 5 & data[,7] >= 1 & data[,7] <= 5 | data[,4] == 4244 | data[,5] == 4244 | data[,6] == 4244 | data[,7] == 4244],]
我正在使用 &
创建附加条件(您正在寻找 measured.1、measured.2、measured.3 和 measured.4 都是 >= 1
和<= 5
) 和 |
创建替代标准(任何测量值都是 4424
):
给予:
ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
8 8 2014 F 4 3 1 2 Y 6
这不是最漂亮的代码片段,但是(根据微基准测试)它比 运行 一月的 dplyr 方法快 43 倍
如何在 lapply
调用中将数据帧的列传递给 with
函数?
这些我都试过了,还是不行!
lapply(data[ , grepl( "Measured." , names( data ) ) ], with, (. <= 5 & . >= 1) | . == 4244)
lapply(data[ , grepl( "Measured." , names( data ) ) ], function(x) with((x <= 5 & x >= 1) | x == 4244))
我正在尝试查看 Measured.
列中的值是否介于 1
和 5
之间,此外 4244
是否也被接受。
示例数据集:
data <- structure(list(ID = 1:10, Date = c(2018L, 2018L, 2018L, 2015L,
2018L, 2015L, 2015L, 2014L, 2014L, 2014L), Gender = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("F", "M"), class = "factor"),
Measured.1 = c(1L, 7L, 1L, 6L, 6L, 2L, 5L, 4L, 2L, 6L), Measured.2 = c(9L,
2L, 4L, 5L, 2L, 3L, 6L, 3L, 7L, 7L), Measured.3 = c(9L, 4L,
35L, 3L, 4L, 2L, 2L, 1L, 3L, 4L), Measured.4 = c(12L, 8L,
50L, 7L, 2L, 6L, 2L, 2L, 1L, 2L), Text = structure(c(1L,
1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L), .Label = c("N", "Y"), class = "factor"),
Test = c(5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L)), .Names = c("ID",
"Date", "Gender", "Measured.1", "Measured.2", "Measured.3", "Measured.4",
"Text", "Test"), class = "data.frame", row.names = c(NA, -10L
))
及其输出:
ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1 1 2018 M 1 9 9 12 N 5
2 2 2018 M 7 2 4 8 N 5
3 3 2018 M 1 4 35 50 N 5
4 4 2015 M 6 5 3 7 N 5
5 5 2018 M 6 2 4 2 N 5
6 6 2015 M 2 3 2 6 Y 6
7 7 2015 F 5 6 2 2 Y 6
8 8 2014 F 4 3 1 2 Y 6
9 9 2014 F 2 7 3 1 N 6
10 10 2014 F 6 7 4 2 N 6
除了基础 R
之外,您还可以使用 dplyr
解决方案:
library(dplyr)
data %>%
filter_at(vars(starts_with("Measured")),
any_vars((. >= 1 & . <= 5) | . == 4244))
这将查找 Measured
列中至少 一个 的值介于 1 和 5 或 4244 之间的记录。
如果你想限制并且所有值需要在这个范围内,你可以将它更改为:
data %>%
filter_at(vars(starts_with("Measured")),
all_vars((. >= 1 & . <= 5) | . == 4244))
前者产生
ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1 1 2018 M 1 9 9 12 N 5
2 2 2018 M 7 2 4 8 N 5
3 3 2018 M 1 4 35 50 N 5
4 4 2015 M 6 5 3 7 N 5
5 5 2018 M 6 2 4 2 N 5
6 6 2015 M 2 3 2 6 Y 6
7 7 2015 F 5 6 2 2 Y 6
8 8 2014 F 4 3 1 2 Y 6
9 9 2014 F 2 7 3 1 N 6
10 10 2014 F 6 7 4 2 N 6
而后者产生
ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
1 8 2014 F 4 3 1 2 Y 6
使用 base
R
可以以更快(但 imo 可读性较差)的方式完成相同的操作:
您可以使用带有掩码和 apply
: 的基础 R
方法
# set up the cols of interest
colmask <- grepl("^Measured", names(data))
# apply the function rowwise (=1)
rowmask <- apply(data[colmask], 1, function(col) {
any(((col >= 1 & col <= 5) | col == 4244))
})
data[rowmask,]
或
colmask <- grepl("^Measured", names(data))
rowmask <- apply(data[colmask], 1, function(col) {
all(((col >= 1 & col <= 5) | col == 4244))
})
data[rowmask,]
显然这会产生相同的结果。
使用 base R,您可以将符合这些条件的行提取为:
data[data[,1][data[,4] >= 1 & data[,4] <= 5 & data[,5] >= 1 & data[,5] <= 5 & data[,6] >= 1 & data[,6] <= 5 & data[,7] >= 1 & data[,7] <= 5 | data[,4] == 4244 | data[,5] == 4244 | data[,6] == 4244 | data[,7] == 4244],]
我正在使用 &
创建附加条件(您正在寻找 measured.1、measured.2、measured.3 和 measured.4 都是 >= 1
和<= 5
) 和 |
创建替代标准(任何测量值都是 4424
):
给予:
ID Date Gender Measured.1 Measured.2 Measured.3 Measured.4 Text Test
8 8 2014 F 4 3 1 2 Y 6
这不是最漂亮的代码片段,但是(根据微基准测试)它比 运行 一月的 dplyr 方法快 43 倍