R:在两个日期之间应用函数
R: Applying a function between two dates
我有两个数据框,如下:
包含交易的数据框:tradeData(示例):
Login OpenTime CloseTime Decision
859 13/01/2014 13/01/2014 1
859 16/01/2014 16/01/2014 1
859 21/01/2014 21/01/2014 1
859 21/01/2014 21/01/2014 1
859 22/01/2014 22/01/2014 1
859 23/01/2014 23/01/2014 1
859 27/01/2014 27/01/2014 1
859 03/02/2014 03/02/2014 1
859 04/02/2014 05/02/2014 1
859 07/02/2014 07/02/2014 1
859 11/02/2014 13/02/2014 1
939 06/02/2014 28/02/2014 1
939 06/02/2014 28/02/2014 1
939 06/02/2014 28/02/2014 1
1455 03/04/2014 03/04/2014 1
1455 04/04/2014 04/04/2014 1
1455 04/04/2014 07/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 09/04/2014 30/04/2014 1
1455 30/04/2014 30/04/2014 1
和另一个带有日期的数据框:datesData(样本):
Login B_A A_B
859 22/01/2014 23/01/2014
859 03/02/2014 07/02/2014
859 11/02/2014 12/02/2014
939 06/02/2014 01/01/2200
1455 04/04/2014 08/04/2014
1455 09/05/2014 30/06/2014
任何交易(即 tradeData 数据框中的一行)在 datesData 数据框中任何行中的两个日期之间打开并与登录匹配,应该在决策列中收到 0。它必须在 B_A 列中的日期或之后打开,并在 A_B 列中的日期之前打开。此决策列已预先填充 1,因此我需要做的就是插入 0
生成的 tradeData 数据框如下所示:
Login OpenTime CloseTime Decision
859 13/01/2014 13/01/2014 1
859 16/01/2014 16/01/2014 1
859 21/01/2014 21/01/2014 1
859 21/01/2014 21/01/2014 1
859 22/01/2014 22/01/2014 0
859 23/01/2014 23/01/2014 1
859 27/01/2014 27/01/2014 1
859 03/02/2014 03/02/2014 0
859 04/02/2014 05/02/2014 0
859 07/02/2014 07/02/2014 1
859 11/02/2014 13/02/2014 0
939 06/02/2014 28/02/2014 0
939 06/02/2014 28/02/2014 0
939 06/02/2014 28/02/2014 0
1455 03/04/2014 03/04/2014 1
1455 04/04/2014 04/04/2014 0
1455 04/04/2014 07/04/2014 0
1455 08/04/2014 08/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 09/04/2014 30/04/2014 0
1455 30/04/2014 30/04/2014 1
因此,例如,tradeData 数据框中的第五行在 2014 年 1 月 22 日和 2014 年 1 月 23 日之前打开(datesDate 数据框中的第一行)并匹配该行中的登录名,所以它收到一个 0.
任何帮助都会很棒!如果有任何不清楚的地方,请告诉我。
谢谢!
麦克
一种方法是使用 data.table
包:
library(data.table)
# convert to dates usefull columns
setDT(tradeData)
setkey(tradeData, Login)
tradeData[,OpenTime:=as.Date(OpenTime, format="%d/%m/%Y")]
# convert to dates usefull columns
df1 = datesData
df1$B_A = as.Date(df1$B_A, format="%d/%m/%Y")
df1$A_B = as.Date(df1$A_B, format="%d/%m/%Y")
tradeData[,Decision:=sapply(OpenTime,function(d){
dt=df1[df1$Login==Login,]
as.integer(!any(d>=dt$B_A & d<dt$A_B))
}),
by=Login]
结果如下所示:
> tradeData
Login OpenTime CloseTime Decision
1: 859 2014-01-13 13/01/2014 1
2: 859 2014-01-16 16/01/2014 1
3: 859 2014-01-21 21/01/2014 1
4: 859 2014-01-21 21/01/2014 1
5: 859 2014-01-22 22/01/2014 0
6: 859 2014-01-23 23/01/2014 1
7: 859 2014-01-27 27/01/2014 1
8: 859 2014-02-03 03/02/2014 0
9: 859 2014-02-04 05/02/2014 0
10: 859 2014-02-07 07/02/2014 1
11: 859 2014-02-11 13/02/2014 0
12: 939 2014-02-06 28/02/2014 0
13: 939 2014-02-06 28/02/2014 0
14: 939 2014-02-06 28/02/2014 0
15: 1455 2014-04-03 03/04/2014 1
16: 1455 2014-04-04 04/04/2014 0
17: 1455 2014-04-04 07/04/2014 0
18: 1455 2014-04-08 08/04/2014 1
19: 1455 2014-04-08 08/04/2014 1
20: 1455 2014-04-09 30/04/2014 1
21: 1455 2014-04-30 30/04/2014 1
这是一个使用 sqldf
包的解决方案。
tradeData$OpenTime <- as.Date(trade.data$OpenTime, format="%d/%m/%Y")
datesData$B_A <- as.Date(datasData$B_A, format="%d/%m/%Y")
datesData$A_B <- as.Date(datasData$A_B, format="%d/%m/%Y")
sqldf(c("UPDATE tradeData
SET Decision = 0
WHERE EXISTS (SELECT * FROM datesData WHERE
tradeData.Login = datesData.Login AND
tradeData.OpenTime >= datesData.B_A AND
tradeData.OpenTime < datesData.A_B)",
"SELECT * FROM tradeData"))
# Login OpenTime CloseTime Decision
# 1 859 2014-01-13 13/01/2014 1
# 2 859 2014-01-16 16/01/2014 1
# 3 859 2014-01-21 21/01/2014 1
# 4 859 2014-01-21 21/01/2014 1
# 5 859 2014-01-22 22/01/2014 0
# 6 859 2014-01-23 23/01/2014 1
# 7 859 2014-01-27 27/01/2014 1
# 8 859 2014-02-03 03/02/2014 0
# 9 859 2014-02-04 05/02/2014 0
# 10 859 2014-02-07 07/02/2014 1
# 11 859 2014-02-11 13/02/2014 0
# 12 939 2014-02-06 28/02/2014 0
# 13 939 2014-02-06 28/02/2014 0
# 14 939 2014-02-06 28/02/2014 0
# 15 1455 2014-04-03 03/04/2014 1
# 16 1455 2014-04-04 04/04/2014 0
# 17 1455 2014-04-04 07/04/2014 0
# 18 1455 2014-04-08 08/04/2014 1
# 19 1455 2014-04-08 08/04/2014 1
# 20 1455 2014-04-09 30/04/2014 1
# 21 1455 2014-04-30 30/04/2014 1
我有两个数据框,如下:
包含交易的数据框:tradeData(示例):
Login OpenTime CloseTime Decision
859 13/01/2014 13/01/2014 1
859 16/01/2014 16/01/2014 1
859 21/01/2014 21/01/2014 1
859 21/01/2014 21/01/2014 1
859 22/01/2014 22/01/2014 1
859 23/01/2014 23/01/2014 1
859 27/01/2014 27/01/2014 1
859 03/02/2014 03/02/2014 1
859 04/02/2014 05/02/2014 1
859 07/02/2014 07/02/2014 1
859 11/02/2014 13/02/2014 1
939 06/02/2014 28/02/2014 1
939 06/02/2014 28/02/2014 1
939 06/02/2014 28/02/2014 1
1455 03/04/2014 03/04/2014 1
1455 04/04/2014 04/04/2014 1
1455 04/04/2014 07/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 09/04/2014 30/04/2014 1
1455 30/04/2014 30/04/2014 1
和另一个带有日期的数据框:datesData(样本):
Login B_A A_B
859 22/01/2014 23/01/2014
859 03/02/2014 07/02/2014
859 11/02/2014 12/02/2014
939 06/02/2014 01/01/2200
1455 04/04/2014 08/04/2014
1455 09/05/2014 30/06/2014
任何交易(即 tradeData 数据框中的一行)在 datesData 数据框中任何行中的两个日期之间打开并与登录匹配,应该在决策列中收到 0。它必须在 B_A 列中的日期或之后打开,并在 A_B 列中的日期之前打开。此决策列已预先填充 1,因此我需要做的就是插入 0
生成的 tradeData 数据框如下所示:
Login OpenTime CloseTime Decision
859 13/01/2014 13/01/2014 1
859 16/01/2014 16/01/2014 1
859 21/01/2014 21/01/2014 1
859 21/01/2014 21/01/2014 1
859 22/01/2014 22/01/2014 0
859 23/01/2014 23/01/2014 1
859 27/01/2014 27/01/2014 1
859 03/02/2014 03/02/2014 0
859 04/02/2014 05/02/2014 0
859 07/02/2014 07/02/2014 1
859 11/02/2014 13/02/2014 0
939 06/02/2014 28/02/2014 0
939 06/02/2014 28/02/2014 0
939 06/02/2014 28/02/2014 0
1455 03/04/2014 03/04/2014 1
1455 04/04/2014 04/04/2014 0
1455 04/04/2014 07/04/2014 0
1455 08/04/2014 08/04/2014 1
1455 08/04/2014 08/04/2014 1
1455 09/04/2014 30/04/2014 0
1455 30/04/2014 30/04/2014 1
因此,例如,tradeData 数据框中的第五行在 2014 年 1 月 22 日和 2014 年 1 月 23 日之前打开(datesDate 数据框中的第一行)并匹配该行中的登录名,所以它收到一个 0.
任何帮助都会很棒!如果有任何不清楚的地方,请告诉我。
谢谢!
麦克
一种方法是使用 data.table
包:
library(data.table)
# convert to dates usefull columns
setDT(tradeData)
setkey(tradeData, Login)
tradeData[,OpenTime:=as.Date(OpenTime, format="%d/%m/%Y")]
# convert to dates usefull columns
df1 = datesData
df1$B_A = as.Date(df1$B_A, format="%d/%m/%Y")
df1$A_B = as.Date(df1$A_B, format="%d/%m/%Y")
tradeData[,Decision:=sapply(OpenTime,function(d){
dt=df1[df1$Login==Login,]
as.integer(!any(d>=dt$B_A & d<dt$A_B))
}),
by=Login]
结果如下所示:
> tradeData
Login OpenTime CloseTime Decision
1: 859 2014-01-13 13/01/2014 1
2: 859 2014-01-16 16/01/2014 1
3: 859 2014-01-21 21/01/2014 1
4: 859 2014-01-21 21/01/2014 1
5: 859 2014-01-22 22/01/2014 0
6: 859 2014-01-23 23/01/2014 1
7: 859 2014-01-27 27/01/2014 1
8: 859 2014-02-03 03/02/2014 0
9: 859 2014-02-04 05/02/2014 0
10: 859 2014-02-07 07/02/2014 1
11: 859 2014-02-11 13/02/2014 0
12: 939 2014-02-06 28/02/2014 0
13: 939 2014-02-06 28/02/2014 0
14: 939 2014-02-06 28/02/2014 0
15: 1455 2014-04-03 03/04/2014 1
16: 1455 2014-04-04 04/04/2014 0
17: 1455 2014-04-04 07/04/2014 0
18: 1455 2014-04-08 08/04/2014 1
19: 1455 2014-04-08 08/04/2014 1
20: 1455 2014-04-09 30/04/2014 1
21: 1455 2014-04-30 30/04/2014 1
这是一个使用 sqldf
包的解决方案。
tradeData$OpenTime <- as.Date(trade.data$OpenTime, format="%d/%m/%Y")
datesData$B_A <- as.Date(datasData$B_A, format="%d/%m/%Y")
datesData$A_B <- as.Date(datasData$A_B, format="%d/%m/%Y")
sqldf(c("UPDATE tradeData
SET Decision = 0
WHERE EXISTS (SELECT * FROM datesData WHERE
tradeData.Login = datesData.Login AND
tradeData.OpenTime >= datesData.B_A AND
tradeData.OpenTime < datesData.A_B)",
"SELECT * FROM tradeData"))
# Login OpenTime CloseTime Decision
# 1 859 2014-01-13 13/01/2014 1
# 2 859 2014-01-16 16/01/2014 1
# 3 859 2014-01-21 21/01/2014 1
# 4 859 2014-01-21 21/01/2014 1
# 5 859 2014-01-22 22/01/2014 0
# 6 859 2014-01-23 23/01/2014 1
# 7 859 2014-01-27 27/01/2014 1
# 8 859 2014-02-03 03/02/2014 0
# 9 859 2014-02-04 05/02/2014 0
# 10 859 2014-02-07 07/02/2014 1
# 11 859 2014-02-11 13/02/2014 0
# 12 939 2014-02-06 28/02/2014 0
# 13 939 2014-02-06 28/02/2014 0
# 14 939 2014-02-06 28/02/2014 0
# 15 1455 2014-04-03 03/04/2014 1
# 16 1455 2014-04-04 04/04/2014 0
# 17 1455 2014-04-04 07/04/2014 0
# 18 1455 2014-04-08 08/04/2014 1
# 19 1455 2014-04-08 08/04/2014 1
# 20 1455 2014-04-09 30/04/2014 1
# 21 1455 2014-04-30 30/04/2014 1