R Data.table 根据另一列划分列中的值
R Data.table divide values in column based on another column
我有一个 main data.table,它有 364 行和 3 列:
Date Weekday Weight
2012-01-01 Monday 100
2013-01-02 Tuesday 200
...
和一个 帮助 data.table 7 行 2 列:
Weekday Coefficient
Monday 0.91
Tuesday 0.84
Wednesday 0.99
...
现在我想在主 data.table 中创建第 4 列,其中 "weight/Coefficient" 基于工作日。
Weight_divided <- main[, Weight * help[Weekday==main$Weekday]$Coefficient]
结果如下:
Date Weekday Weight Weight_divided
2012-01-01 Monday 100 91
2013-01-02 Tuesday 200 168
2012-01-03 Wednesday 300 297
2012-01-04 Thursday 400 256
2012-01-05 Friday 500 399
2012-01-06 Saturday 600 410
2012-01-07 Sunday 700 680
2012-01-08 Monday 300 NA <--
2012-01-09 Tuesday 600 NA <--
...
我想问题是两个 data.table 的长度不同。
有没有一种方法可以在主 data.table 操作中引用较短的 data.table?
使用data.table
library(data.table)
setkey(main, Weekday)[help, Weight_Coef := Weight*Coefficient][order(Date)]
# Weekday Date Weight Weight_Coef
# 1: Monday 2012-01-01 59 53.69
# 2: Tuesday 2012-01-02 45 37.80
# 3: Wednesday 2012-01-03 141 139.59
# 4: Thursday 2012-01-04 104 97.76
# 5: Friday 2012-01-05 133 109.06
#---
#360: Wednesday 2012-12-25 192 190.08
#361: Thursday 2012-12-26 79 74.26
#362: Friday 2012-12-27 39 31.98
#363: Saturday 2012-12-28 175 148.75
#364: Sunday 2012-12-29 134 116.58
数据
set.seed(24)
main <- data.table(Weekday=rep(c('Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday'), length.out=364),
Date=seq(as.Date('2012-01-01'), length.out=364, by='day'),
Weight=sample(200, 364, replace=TRUE))
help <- data.table(Weekday=c('Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday'), Coefficient=c(0.91, 0.84,
0.99, 0.94, 0.82, 0.85, 0.87))
我有一个 main data.table,它有 364 行和 3 列:
Date Weekday Weight
2012-01-01 Monday 100
2013-01-02 Tuesday 200
...
和一个 帮助 data.table 7 行 2 列:
Weekday Coefficient
Monday 0.91
Tuesday 0.84
Wednesday 0.99
...
现在我想在主 data.table 中创建第 4 列,其中 "weight/Coefficient" 基于工作日。
Weight_divided <- main[, Weight * help[Weekday==main$Weekday]$Coefficient]
结果如下:
Date Weekday Weight Weight_divided
2012-01-01 Monday 100 91
2013-01-02 Tuesday 200 168
2012-01-03 Wednesday 300 297
2012-01-04 Thursday 400 256
2012-01-05 Friday 500 399
2012-01-06 Saturday 600 410
2012-01-07 Sunday 700 680
2012-01-08 Monday 300 NA <--
2012-01-09 Tuesday 600 NA <--
...
我想问题是两个 data.table 的长度不同。 有没有一种方法可以在主 data.table 操作中引用较短的 data.table?
使用data.table
library(data.table)
setkey(main, Weekday)[help, Weight_Coef := Weight*Coefficient][order(Date)]
# Weekday Date Weight Weight_Coef
# 1: Monday 2012-01-01 59 53.69
# 2: Tuesday 2012-01-02 45 37.80
# 3: Wednesday 2012-01-03 141 139.59
# 4: Thursday 2012-01-04 104 97.76
# 5: Friday 2012-01-05 133 109.06
#---
#360: Wednesday 2012-12-25 192 190.08
#361: Thursday 2012-12-26 79 74.26
#362: Friday 2012-12-27 39 31.98
#363: Saturday 2012-12-28 175 148.75
#364: Sunday 2012-12-29 134 116.58
数据
set.seed(24)
main <- data.table(Weekday=rep(c('Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday'), length.out=364),
Date=seq(as.Date('2012-01-01'), length.out=364, by='day'),
Weight=sample(200, 364, replace=TRUE))
help <- data.table(Weekday=c('Monday', 'Tuesday', 'Wednesday',
'Thursday', 'Friday', 'Saturday', 'Sunday'), Coefficient=c(0.91, 0.84,
0.99, 0.94, 0.82, 0.85, 0.87))