如何使用 R 在分布中分布一系列卷

How to distribute a series of volumes across a distribution using R

我正在尝试预测事件的数量(图书 returns)。我有一个特定日期预期 return 的潜在交易量的数据框(源自 table)和先前 return 行为的密度函数。我的计划是使用 convolve 函数,但我即将解脱。关于最佳前进方式的任何想法?

正在计算借用时间长度

data$borrow_length <- data$due_date - data$return_date

生成 PDF

renewal_pdf <- density(data$borrow_length)
plot(borrow_pdf)

产量

return_volume <- as.data.frame(table(data$due_date))

output <- convolve(borrow_pdf, return_volume$Freq, type = "open")

我希望以 table 结束,预测的 return 日期考虑到所有 return 的早晚。

同意@shea:一个可重现的例子会有所帮助。

这是一个,据我了解:

set.seed(1)
N = 200

# Historical data with known return date
old_data = data.frame( due_date = as.Date("2019-04-01") + floor(runif(N, 0, 30)) )
old_data$return_date = old_data$due_date + round(rnorm(N, 0, 5))

# Currently borrowed books
current_data = data.frame( due_date = as.Date("2019-05-10") + floor(runif(N, 0, 30)) )

如果我没理解错的话,你想估计 return_date(未知)在 current_data 上的分布。这是一个手动计算卷积的解决方案:效率不高但易于理解。

# For semantics, I renamed your borrow_length into borrow_delay
old_data$borrow_delay = old_data$return_date - old_data$due_date

# Compute its distribution (no smoothing)
distr_delay = as.data.frame(prop.table(table(delay = old_data$borrow_delay)), responseName="p_delay")
distr_delay$delay = as.integer(distr_delay$delay)

# Counts by due date
tab_volume = as.data.frame(table(due_date = current_data$due_date))
tab_volume$due_date = as.Date(as.character(tab_volume$due_date))

# Explicit convolution
distr_return = merge(tab_volume, distr_delay)
distr_return$return_date = with(distr_return, due_date + delay)
distr_return$expected_n_returns = with(distr_return, Freq*p_delay)
distr_return = with(distr_return, tapply(expected_n_returns, return_date, sum))
# Reformat
distr_return = data.frame(
  return_date = as.Date(names(distr_return)),
  expected_n_returns = c(distr_return)
)

# Sanity check: sum of expectations is 200 (the number of books borrowed)
sum(distr_return$expected_n_returns)

with(distr_return, plot(return_date, expected_n_returns))