在 R 中使用 apply() 进行嵌套 for 循环
Using apply() for nested for loop in R
我在 R 中写了一个嵌套的 for 循环,但是循环花费的时间太长 运行。我有两个大数据集。对于 dfA 中的每一行和 dfB 中的每一行,循环应该查看 dfA 中的日期是否落在 dfB 中的日期间隔内。如果这是真的,那么这两个数据集应该在该行的给定列上合并。我不确定我写的代码是否会工作 w/o 错误,因为循环仍然是 运行ning.
如有任何见解,我们将不胜感激。
dfA:
Common a Date
1 20141331123 1 2005-01-01
2 20141331123 2 2005-01-02
3 20141331123 3 2005-01-03
4 20141331123 4 2005-01-04
5 20141331123 5 2005-01-05
6 20141331123 6 2005-01-06
dfB:
cDate bDate common
1 2005-01-01 2005-06-13 20141331123
dfB$Interval <- interval(ymd(dfB$cDate), ymd(dfB$bDate))
library(lubridate)
for (i in 1:nrow(dfA)) {
for (i in 1:nrow(dfB)) {
if (dfA$Date[i] %within% dfB$Interval[i] == TRUE) {
merged <- merge(dfA, dfB, by.x = c("common"), by.y = c("Common"))
}
}
return(merged)
}
Non-equal 联接在 SQL 中受本地支持,在 R 中的 data.table
中受支持。基本 R 和 tidyverse
函数均不在本地支持它[1]。
library(data.table)
setDT(dfA)
setDT(dfB)
dfB[dfA, on = .(common == Common, cDate <= Date, bDate >= Date)]
# cDate bDate common a
# 1: 2005-01-01 2005-01-01 20141331123 1
# 2: 2005-01-02 2005-01-02 20141331123 2
# 3: 2005-01-03 2005-01-03 20141331123 3
# 4: 2005-01-04 2005-01-04 20141331123 4
# 5: 2005-01-05 2005-01-05 20141331123 5
# 6: 2005-01-06 2005-01-06 20141331123 6
示例数据有点无趣,因为所有内容都适合单个区间,但这也许适用于您的更多变化的数据。
[1]:由于 SQL 支持它,因此在 dbplyr
中使用 sql_on
支持它。
数据:
dfA <- structure(list(Common = c("20141331123", "20141331123", "20141331123", "20141331123", "20141331123", "20141331123"), a = 1:6, Date = structure(c(12784, 12785, 12786, 12787, 12788, 12789), class = "Date")), row.names = c(NA, -6L), class = "data.frame")
dfB <- structure(list(cDate = structure(12784, class = "Date"), bDate = structure(12947, class = "Date"), common = "20141331123"), row.names = c(NA, -1L), class = "data.frame")
如果数据大小允许,请考虑直接 merge
和 subset
。
final_df <- subset(merge(dfA, dfB, by.x="Common", by.y="common"),
Date >= cDate & Date <= bDate)
final_df
# Common a Date cDate bDate
# 1 20141331123 1 2005-01-01 2005-01-01 2005-06-13
# 2 20141331123 2 2005-01-02 2005-01-01 2005-06-13
# 3 20141331123 3 2005-01-03 2005-01-01 2005-06-13
# 4 20141331123 4 2005-01-04 2005-01-01 2005-06-13
# 5 20141331123 5 2005-01-05 2005-01-01 2005-06-13
# 6 20141331123 6 2005-01-06 2005-01-01 2005-06-13
我在 R 中写了一个嵌套的 for 循环,但是循环花费的时间太长 运行。我有两个大数据集。对于 dfA 中的每一行和 dfB 中的每一行,循环应该查看 dfA 中的日期是否落在 dfB 中的日期间隔内。如果这是真的,那么这两个数据集应该在该行的给定列上合并。我不确定我写的代码是否会工作 w/o 错误,因为循环仍然是 运行ning.
如有任何见解,我们将不胜感激。
dfA:
Common a Date
1 20141331123 1 2005-01-01
2 20141331123 2 2005-01-02
3 20141331123 3 2005-01-03
4 20141331123 4 2005-01-04
5 20141331123 5 2005-01-05
6 20141331123 6 2005-01-06
dfB:
cDate bDate common
1 2005-01-01 2005-06-13 20141331123
dfB$Interval <- interval(ymd(dfB$cDate), ymd(dfB$bDate))
library(lubridate)
for (i in 1:nrow(dfA)) {
for (i in 1:nrow(dfB)) {
if (dfA$Date[i] %within% dfB$Interval[i] == TRUE) {
merged <- merge(dfA, dfB, by.x = c("common"), by.y = c("Common"))
}
}
return(merged)
}
Non-equal 联接在 SQL 中受本地支持,在 R 中的 data.table
中受支持。基本 R 和 tidyverse
函数均不在本地支持它[1]。
library(data.table)
setDT(dfA)
setDT(dfB)
dfB[dfA, on = .(common == Common, cDate <= Date, bDate >= Date)]
# cDate bDate common a
# 1: 2005-01-01 2005-01-01 20141331123 1
# 2: 2005-01-02 2005-01-02 20141331123 2
# 3: 2005-01-03 2005-01-03 20141331123 3
# 4: 2005-01-04 2005-01-04 20141331123 4
# 5: 2005-01-05 2005-01-05 20141331123 5
# 6: 2005-01-06 2005-01-06 20141331123 6
示例数据有点无趣,因为所有内容都适合单个区间,但这也许适用于您的更多变化的数据。
[1]:由于 SQL 支持它,因此在 dbplyr
中使用 sql_on
支持它。
数据:
dfA <- structure(list(Common = c("20141331123", "20141331123", "20141331123", "20141331123", "20141331123", "20141331123"), a = 1:6, Date = structure(c(12784, 12785, 12786, 12787, 12788, 12789), class = "Date")), row.names = c(NA, -6L), class = "data.frame")
dfB <- structure(list(cDate = structure(12784, class = "Date"), bDate = structure(12947, class = "Date"), common = "20141331123"), row.names = c(NA, -1L), class = "data.frame")
如果数据大小允许,请考虑直接 merge
和 subset
。
final_df <- subset(merge(dfA, dfB, by.x="Common", by.y="common"),
Date >= cDate & Date <= bDate)
final_df
# Common a Date cDate bDate
# 1 20141331123 1 2005-01-01 2005-01-01 2005-06-13
# 2 20141331123 2 2005-01-02 2005-01-01 2005-06-13
# 3 20141331123 3 2005-01-03 2005-01-01 2005-06-13
# 4 20141331123 4 2005-01-04 2005-01-01 2005-06-13
# 5 20141331123 5 2005-01-05 2005-01-01 2005-06-13
# 6 20141331123 6 2005-01-06 2005-01-01 2005-06-13