删除较小的重复项

Question

在 KDB 中，我有以下 table：

q)tab:flip `items`sales`prices!(`nut`bolt`cam`cog`bolt`screw;6 8 0 3 0n 0n;10  20 15 20 0n 0n)
q)tab

items sales prices
------------------
nut   6     10
bolt  8     20
cam   0     15
cog   3     20
bolt
screw

在此table中，有 2 个重复项目（螺栓）。但是，由于第一个 'bolt' 包含更多信息。我想移除 'lesser' 螺栓。

最终结果：

items sales prices
------------------
nut   6     10
bolt  8     20
cam   0     15
cog   3     20
screw

据我了解，如果我使用 'distinct' 函数，它不是确定性的吗？

Answer 1

因为这两行包含不同的数据，所以它们被认为是不同的。

这取决于你如何定义"more information"。您可能需要提供更多示例，但有一些可能性：

删除销售值为空的行

q)delete from tab where null sales
items sales prices
------------------
nut   6     10    
bolt  8     20    
cam   0     15    
cog   3     20

检索每个项目的最大销售额的行

q)select from tab where (sales*prices) = (max;sales*prices) fby items
items sales prices
------------------
nut   6     10    
bolt  8     20    
cam   0     15    
cog   3     20

Answer 2

一种方法是按项目向前填充，然后bolt将继承以前的值。

q)update fills sales,fills prices by items from tab
items sales prices
------------------
nut   6     10
bolt  8     20
cam   0     15
cog   3     20
bolt  8     20
screw

这也可以以函数形式完成，您可以在其中传递 table 和 by 列：

{![x;();(!). 2#enlist(),y;{x!fills,/:x}cols[x]except y]}[tab;`items]

如果 "more information" 表示 "least nulls" 那么您可以计算每行中空值的数量，并且仅 return 那些包含最少项目的行：

q)select from @[tab;`n;:;sum each null tab] where n=(min;n)fby items
items sales prices n
--------------------
nut   6     10     0
bolt  8     20     0
cam   0     15     0
cog   3     20     0
screw              2

尽管不推荐这种方法，因为它需要处理行而不是列。

删除较小的重复项

remove a lesser duplicate

kdb