在 kdb 中使用 ungroup 的替代方案？

Question

我在 KDB 有两个 table。

一个是带有日期时间、符号列的时间序列（跨越多个日期，例如可以是 1 毫米行或 2 毫米行）。每个时间点都有相同数量的符号和很少的其他标准列，例如价格。我们称其为 t1:

`date`datetime`sym`price

另一个table是这样的结构：

`date`sym`factors`weights

其中 factors 是一个列表，weights 是每个符号长度相等的列表。我们称它为 t2.

我正在对这两个 table 进行左连接，然后取消组合。每个符号的因子和权重的长度不相等。

我正在执行以下操作：

select sum (weights*price) by date, factors from ungroup t1 lj `date`sym xkey t2

然而，这非常慢，如果 t1 有一百万行或更多，可能会慢到 5-6 秒。

向所有 kdb 专家寻求建议！

编辑：

这是一个完整的例子：（为定义 t1 和 t2 的迂回方式道歉）

interval: `long$`time[=13=]:01:00; 
hops: til 1+ `int$((`long$(et:`time:00)-st:`time:00))%interval;
times: st + `long$interval*hops; 
dates: .z.D - til .z.D-.z.D-10; 
timepoints: ([] date: dates) cross ([] time:times); 
syms: ([] sym: 300?`5); 
universe: timepoints cross syms; 
t1: update datetime: date+time, price:count[universe]?100.0 from universe;
t2: ([] date:dates) cross syms; 
/ note here my real life t2, doesn't have a count of 10 weights/factors for each sym, it can vary by sym. 
t2: `date`sym xkey update factors: count[t2]#enlist 10?`5, weights: count[t2]#enlist 10?10 from t2; 

/ what is slow is the ungroup 
select sum weights*price by date, datetime, factors from ungroup t1 lj t2

Answer 1

避免取消分组的一种方法是使用矩阵（又名列表的列表）并利用优化的矩阵乘法$，参见此处：https://code.kx.com/q/ref/mmu/

在我下面的方法中，我不是将 t2 连接到 t1 以取消组合，而是将 t1 组合并连接到 t2（因此将所有内容都保留为列表列表），然后使用一些矩阵操作（最后一个取消组合一个小得多的集合）

q)\ts res:select sum weights*price by date, factors from ungroup t1 lj t2
4100 3035628112
q)\ts resT:ungroup exec first factors,sum each flip["f"$weights]$price by date:date from t2 lj select price by date,sym from t1;
76 83892800

q)(0!res)~`date`factors xasc `date`factors`weights xcol resT
1b

如您所见，它的速度要快得多（至少在我的机器上是这样），而且除了排序和列名之外，结果是相同的。

您可能仍需要稍微修改此解决方案以在您的实际用例中工作（具有可变权重等 - 在这种情况下，可能在必要时用零填充每个符号强制执行统一数量的权重）

在 kdb 中使用 ungroup 的替代方案？

Alternative to using ungroup in kdb?

performance

kdb