避免使用 data.table 进行循环
Avoid for loop using data.table
我有一个随时间变化的模拟 (dev_quarters),看起来像这样,它是 data.table :
simulation <- data.table(`Scenario ID` = 1, dev_quarter = seq(1:80), brand = 1, proportion = runif(80))
对于每个场景,我们有 n_brand、n_scenario 和一个比例。
我尝试编写以下代码:对于每个场景,每个品牌,计算每年年初和年底之间的比例差异。
我做了以下来恢复每年相应的 dev_quarters :
x <- 2002:2021
lookup_T <- as.integer(format(Sys.Date(), "%Y"))
lookup_period <- data.table(years = lookup_T-x+1, quarters_t = (lookup_T-x+1)*4, quarters_t1 = (lookup_T-x+2)*4)
举个小例子
n_scenario <- 1
n_brand <- 10
使用 for 循环的丑陋代码:
result <- data.table(`Scenario ID` = numeric(), years = numeric(), brand = numeric(), proportion = numeric())
for(i in 1:n_scenario){
for(j in 1:n_brand){
prop_per_year <- c()
# for each year
for(k in 1:length(x)){
year <- lookup_period[k, ]
quarter_start_year <- year[["quarters_t"]]
quarter_end_year <- year[["quarters_t1"]]
end_year_prop <- simulation[`Scenario ID`==i & brand==j & dev_quarter==quarter_end_year]
start_year_prop <- simulation[`Scenario ID`==i & brand==j & dev_quarter==quarter_start_year]
prop_this_year <- max(end_year_prop[["proportion"]] - start_year_prop[["proportion"]], 0)
prop_per_year <- append(prop_per_year, prop_this_year)
}
result_temp <- data.table(`Scenario ID` = i, years = x, brand = j, proportion = prop_per_year)
result <- rbind(result, result_temp)
}
}
我考虑过滤我的 data.table,仅使用行 dev_quarters 是 4k 因子,但问题仍然与 for 循环相同。
我怎样才能避免他们使用 data.table ?
谢谢。
第四季度和第一季度的绝对比例变化可以更容易地计算出来。
simulation[, year := 2002 + (dev_quarter-1) %/% 4] # Easier way to calculate the year
simulation[, .(change = last(proportion) - first(proportion)), by = c("Scenario ID", "brand", "year")
我有一个随时间变化的模拟 (dev_quarters),看起来像这样,它是 data.table :
simulation <- data.table(`Scenario ID` = 1, dev_quarter = seq(1:80), brand = 1, proportion = runif(80))
对于每个场景,我们有 n_brand、n_scenario 和一个比例。
我尝试编写以下代码:对于每个场景,每个品牌,计算每年年初和年底之间的比例差异。
我做了以下来恢复每年相应的 dev_quarters :
x <- 2002:2021
lookup_T <- as.integer(format(Sys.Date(), "%Y"))
lookup_period <- data.table(years = lookup_T-x+1, quarters_t = (lookup_T-x+1)*4, quarters_t1 = (lookup_T-x+2)*4)
举个小例子
n_scenario <- 1
n_brand <- 10
使用 for 循环的丑陋代码:
result <- data.table(`Scenario ID` = numeric(), years = numeric(), brand = numeric(), proportion = numeric())
for(i in 1:n_scenario){
for(j in 1:n_brand){
prop_per_year <- c()
# for each year
for(k in 1:length(x)){
year <- lookup_period[k, ]
quarter_start_year <- year[["quarters_t"]]
quarter_end_year <- year[["quarters_t1"]]
end_year_prop <- simulation[`Scenario ID`==i & brand==j & dev_quarter==quarter_end_year]
start_year_prop <- simulation[`Scenario ID`==i & brand==j & dev_quarter==quarter_start_year]
prop_this_year <- max(end_year_prop[["proportion"]] - start_year_prop[["proportion"]], 0)
prop_per_year <- append(prop_per_year, prop_this_year)
}
result_temp <- data.table(`Scenario ID` = i, years = x, brand = j, proportion = prop_per_year)
result <- rbind(result, result_temp)
}
}
我考虑过滤我的 data.table,仅使用行 dev_quarters 是 4k 因子,但问题仍然与 for 循环相同。 我怎样才能避免他们使用 data.table ?
谢谢。
第四季度和第一季度的绝对比例变化可以更容易地计算出来。
simulation[, year := 2002 + (dev_quarter-1) %/% 4] # Easier way to calculate the year
simulation[, .(change = last(proportion) - first(proportion)), by = c("Scenario ID", "brand", "year")