使用 Azure 数据资源管理器中的每周条目计算 table 中 X 周的平均值 - 除了自连接之外还有其他选项吗?
Calculating average of a value for X number of weeks in a table with weekly entries in Azure Data Explorer - Any option other than self join?
我有一个 table,其中每一行都来自一周。同一周有多行,但基于几个维度,它们是唯一的。
| Week | Col1 | Col2 |
----------------------
| W1 | X1 | a |
| W1 | X2 | b |
| W2 | X3 | a |
.
... More rows
我想计算 4 周(或通常为 X 周)期间 Col1
的平均值。
我知道我可以通过加入 table 自身 4 次来做到这一点,但它似乎不对...有更好的方法吗?
样本输入数据table:
datatable (Week:datetime, Value:decimal , Dim1:string)
[datetime(2020-08-03), 1, "a",
datetime(2020-08-03), 2, "b",
datetime(2020-08-10), 1, "a",
datetime(2020-08-10), 1, "b",
datetime(2020-08-17), 2, "b",
datetime(2020-08-17), 2, "c",
datetime(2020-08-24), 2, "a",
datetime(2020-08-24), 4, "b",
datetime(2020-08-31), 3, "c"]
我想要的结果是(在这个例子中我使用最后一天作为平均值的“天”)。请注意,当一周内没有值时,我假设它是 0
。此外,如果某个维度出现在其中一个星期内,则它会包含在最终平均值中(预计不会发生,但为了完整性而添加):
| Week | Average_Value | Dim1 |
-------------------------------------
| 2020-08-03 | 0.25 | a | <-- backfill with zeroes
| 2020-08-03 | 0.5 | b |
| 2020-08-10 | 0.5 | a |
| 2020-08-10 | 0.75 | b |
| 2020-08-17 | 1 | b |
| 2020-08-17 | 0.5 | c |
| 2020-08-27 | 1 | a |
| 2020-08-27 | 2.25 | b |
| 2020-08-27 | 0.5 | c | <-- has average values even with no value in week
| 2020-08-31 | 0.75 | a | <-- has average values even with no value in week
| 2020-08-31 | 1.75 | b | <-- has average values even with no value in week
| 2020-08-31 | 1.25 | c |
-------------------------------------
这就是我使用连接的方式:
let Test = datatable (Week:datetime, Value:real, Dim1: string)
[datetime(2020-08-03), 1, "a",
datetime(2020-08-03), 2, "b",
datetime(2020-08-10), 1, "a",
datetime(2020-08-10), 1, "b",
datetime(2020-08-17), 2, "b",
datetime(2020-08-17), 2, "c",
datetime(2020-08-24), 4, "b",
datetime(2020-08-24), 2, "a",
datetime(2020-08-31), 3, "c"];
let FullTable = Test
| summarize by Week
| extend A = 1
| join kind=fullouter (Test | summarize by Dim1 | extend A = 1) on A
| join kind=leftouter (Test) on Week, Dim1
| project-away Week1, Dim11, A, A1;
FullTable
| join kind=leftouter (FullTable | extend Week = Week + 7d) on Week, Dim1
| join kind=leftouter (FullTable | extend Week = Week + 14d) on Week, Dim1
| join kind=leftouter (FullTable | extend Week = Week + 21d) on Week, Dim1
| project Week, Dim1, Value0 = iff(isnull(Value), 0.0, Value), Value1 = iff(isnull(Value1), 0.0, Value1), Value2 = iff(isnull(Value2), 0.0, Value2), Value3 = iff(isnull(Value3), 0.0, Value3)
| extend Average = (Value0 + Value1 + Value2 + Value3)/4
| project-away Value0, Value1, Value2, Value3
它确实有效,但似乎应该有 更好的 方法。
请参阅下面的 2 条建议,这些建议来自 aggregations over sliding window。这个想法是将每个值扩展到分析周期 (28d) 的末尾。
let _start = datetime(2020-08-03);
let _period = 28d;
let _end = _start + 28d;
let Test = datatable (Week:datetime, Value:real, Dim1: string)
[datetime(2020-08-03), 1, "a",
datetime(2020-08-03), 2, "b",
datetime(2020-08-10), 1, "a",
datetime(2020-08-10), 1, "b",
datetime(2020-08-17), 2, "b",
datetime(2020-08-17), 2, "c",
datetime(2020-08-24), 4, "b",
datetime(2020-08-24), 2, "a",
datetime(2020-08-31), 3, "c"];
Test
| order by Dim1 asc, Week asc
| extend _bin = bin_at(Week, 7d, _start)
| extend _endRange = iif(_bin + _period > _end, _end,
iff( _bin + _period - 7d < _start, _start,
iff( _bin + _period - 7d < _bin, _bin, _bin + _period - 7d)))
| extend _range = range(_bin, _endRange, 7d)
| mv-expand _range to typeof(datetime)
| extend WeekNum = toint((_range - Week)/7d)
| project Week=_range, Dim1, Value, WeekNum=strcat("Value",WeekNum)
| evaluate pivot(WeekNum, sum(Value))
| project Week, Dim1, Average = (Value0 + Value1 + Value2 + Value3)/4
|Week|Dim1|Average|
|---|---|---|
|2020-08-03 00:00:00.0000000|a|0.25|
|2020-08-03 00:00:00.0000000|b|0.5|
|2020-08-10 00:00:00.0000000|a|0.5|
|2020-08-10 00:00:00.0000000|b|0.75|
|2020-08-17 00:00:00.0000000|a|0.5|
|2020-08-17 00:00:00.0000000|b|1.25|
|2020-08-17 00:00:00.0000000|c|0.5|
|2020-08-24 00:00:00.0000000|a|1|
|2020-08-24 00:00:00.0000000|b|2.25|
|2020-08-24 00:00:00.0000000|c|0.5|
|2020-08-31 00:00:00.0000000|a|0.75|
|2020-08-31 00:00:00.0000000|b|1.75|
|2020-08-31 00:00:00.0000000|c|1.25|
选项#2:
let _start = datetime(2020-08-03);
let _period = 28d;
let _end = _start + 28d;
let Test = datatable (Week:datetime, Value:real, Dim1: string)
[datetime(2020-08-03), 1, "a",
datetime(2020-08-03), 2, "b",
datetime(2020-08-10), 1, "a",
datetime(2020-08-10), 1, "b",
datetime(2020-08-17), 2, "b",
datetime(2020-08-17), 2, "c",
datetime(2020-08-24), 4, "b",
datetime(2020-08-24), 2, "a",
datetime(2020-08-31), 3, "c"];
let _dims = Test | distinct Dim1;
let _fullRange = range Week from _start to _end step 7d
| extend _start = max_of(-3, -((Week-_start)/7d))
| extend _range = range((_start), (_start+3), 1) | mv-expand _range to typeof(int) | project Week, _origin = Week + _range*7d | extend K=1, Value=0.0 ;
let _fullRangeDims = _dims | extend K=1 | join kind=inner (_fullRange) on K | project-away K;
_fullRangeDims
| join kind=fullouter
(Test
| order by Dim1 asc, Week asc
| extend _bin = bin_at(Week, 7d, _start)
| extend _endRange = iif(_bin + _period > _end, _end,
iff( _bin + _period - 7d < _start, _start,
iff( _bin + _period - 7d < _bin, _bin, _bin + _period - 7d)))
| extend _range = range(_bin, _endRange, 7d)
| mv-expand _range to typeof(datetime)
| project Week=_range, Dim1, Value, _origin = Week) on Week, _origin, Dim1
| project Week=coalesce(Week1, Week), Dim1=coalesce(Dim11, Dim1), Value=coalesce(Value1, Value), _origin= coalesce(_origin1, _origin)
| summarize avg(Value) by Week, Dim1
| order by Week asc, Dim1 asc
|Week|Dim1|avg_Value|
|---|---|---|
|2020-08-03 00:00:00.0000000|a|0.25|
|2020-08-03 00:00:00.0000000|b|0.5|
|2020-08-03 00:00:00.0000000|c|0|
|2020-08-10 00:00:00.0000000|a|0.5|
|2020-08-10 00:00:00.0000000|b|0.75|
|2020-08-10 00:00:00.0000000|c|0|
|2020-08-17 00:00:00.0000000|a|0.5|
|2020-08-17 00:00:00.0000000|b|1.25|
|2020-08-17 00:00:00.0000000|c|0.5|
|2020-08-24 00:00:00.0000000|a|1|
|2020-08-24 00:00:00.0000000|b|2.25|
|2020-08-24 00:00:00.0000000|c|0.5|
|2020-08-31 00:00:00.0000000|a|0.75|
|2020-08-31 00:00:00.0000000|b|1.75|
|2020-08-31 00:00:00.0000000|c|1.25|
我有一个 table,其中每一行都来自一周。同一周有多行,但基于几个维度,它们是唯一的。
| Week | Col1 | Col2 |
----------------------
| W1 | X1 | a |
| W1 | X2 | b |
| W2 | X3 | a |
.
... More rows
我想计算 4 周(或通常为 X 周)期间 Col1
的平均值。
我知道我可以通过加入 table 自身 4 次来做到这一点,但它似乎不对...有更好的方法吗?
样本输入数据table:
datatable (Week:datetime, Value:decimal , Dim1:string)
[datetime(2020-08-03), 1, "a",
datetime(2020-08-03), 2, "b",
datetime(2020-08-10), 1, "a",
datetime(2020-08-10), 1, "b",
datetime(2020-08-17), 2, "b",
datetime(2020-08-17), 2, "c",
datetime(2020-08-24), 2, "a",
datetime(2020-08-24), 4, "b",
datetime(2020-08-31), 3, "c"]
我想要的结果是(在这个例子中我使用最后一天作为平均值的“天”)。请注意,当一周内没有值时,我假设它是 0
。此外,如果某个维度出现在其中一个星期内,则它会包含在最终平均值中(预计不会发生,但为了完整性而添加):
| Week | Average_Value | Dim1 |
-------------------------------------
| 2020-08-03 | 0.25 | a | <-- backfill with zeroes
| 2020-08-03 | 0.5 | b |
| 2020-08-10 | 0.5 | a |
| 2020-08-10 | 0.75 | b |
| 2020-08-17 | 1 | b |
| 2020-08-17 | 0.5 | c |
| 2020-08-27 | 1 | a |
| 2020-08-27 | 2.25 | b |
| 2020-08-27 | 0.5 | c | <-- has average values even with no value in week
| 2020-08-31 | 0.75 | a | <-- has average values even with no value in week
| 2020-08-31 | 1.75 | b | <-- has average values even with no value in week
| 2020-08-31 | 1.25 | c |
-------------------------------------
这就是我使用连接的方式:
let Test = datatable (Week:datetime, Value:real, Dim1: string)
[datetime(2020-08-03), 1, "a",
datetime(2020-08-03), 2, "b",
datetime(2020-08-10), 1, "a",
datetime(2020-08-10), 1, "b",
datetime(2020-08-17), 2, "b",
datetime(2020-08-17), 2, "c",
datetime(2020-08-24), 4, "b",
datetime(2020-08-24), 2, "a",
datetime(2020-08-31), 3, "c"];
let FullTable = Test
| summarize by Week
| extend A = 1
| join kind=fullouter (Test | summarize by Dim1 | extend A = 1) on A
| join kind=leftouter (Test) on Week, Dim1
| project-away Week1, Dim11, A, A1;
FullTable
| join kind=leftouter (FullTable | extend Week = Week + 7d) on Week, Dim1
| join kind=leftouter (FullTable | extend Week = Week + 14d) on Week, Dim1
| join kind=leftouter (FullTable | extend Week = Week + 21d) on Week, Dim1
| project Week, Dim1, Value0 = iff(isnull(Value), 0.0, Value), Value1 = iff(isnull(Value1), 0.0, Value1), Value2 = iff(isnull(Value2), 0.0, Value2), Value3 = iff(isnull(Value3), 0.0, Value3)
| extend Average = (Value0 + Value1 + Value2 + Value3)/4
| project-away Value0, Value1, Value2, Value3
它确实有效,但似乎应该有 更好的 方法。
请参阅下面的 2 条建议,这些建议来自 aggregations over sliding window。这个想法是将每个值扩展到分析周期 (28d) 的末尾。
let _start = datetime(2020-08-03);
let _period = 28d;
let _end = _start + 28d;
let Test = datatable (Week:datetime, Value:real, Dim1: string)
[datetime(2020-08-03), 1, "a",
datetime(2020-08-03), 2, "b",
datetime(2020-08-10), 1, "a",
datetime(2020-08-10), 1, "b",
datetime(2020-08-17), 2, "b",
datetime(2020-08-17), 2, "c",
datetime(2020-08-24), 4, "b",
datetime(2020-08-24), 2, "a",
datetime(2020-08-31), 3, "c"];
Test
| order by Dim1 asc, Week asc
| extend _bin = bin_at(Week, 7d, _start)
| extend _endRange = iif(_bin + _period > _end, _end,
iff( _bin + _period - 7d < _start, _start,
iff( _bin + _period - 7d < _bin, _bin, _bin + _period - 7d)))
| extend _range = range(_bin, _endRange, 7d)
| mv-expand _range to typeof(datetime)
| extend WeekNum = toint((_range - Week)/7d)
| project Week=_range, Dim1, Value, WeekNum=strcat("Value",WeekNum)
| evaluate pivot(WeekNum, sum(Value))
| project Week, Dim1, Average = (Value0 + Value1 + Value2 + Value3)/4
|Week|Dim1|Average|
|---|---|---|
|2020-08-03 00:00:00.0000000|a|0.25|
|2020-08-03 00:00:00.0000000|b|0.5|
|2020-08-10 00:00:00.0000000|a|0.5|
|2020-08-10 00:00:00.0000000|b|0.75|
|2020-08-17 00:00:00.0000000|a|0.5|
|2020-08-17 00:00:00.0000000|b|1.25|
|2020-08-17 00:00:00.0000000|c|0.5|
|2020-08-24 00:00:00.0000000|a|1|
|2020-08-24 00:00:00.0000000|b|2.25|
|2020-08-24 00:00:00.0000000|c|0.5|
|2020-08-31 00:00:00.0000000|a|0.75|
|2020-08-31 00:00:00.0000000|b|1.75|
|2020-08-31 00:00:00.0000000|c|1.25|
选项#2:
let _start = datetime(2020-08-03);
let _period = 28d;
let _end = _start + 28d;
let Test = datatable (Week:datetime, Value:real, Dim1: string)
[datetime(2020-08-03), 1, "a",
datetime(2020-08-03), 2, "b",
datetime(2020-08-10), 1, "a",
datetime(2020-08-10), 1, "b",
datetime(2020-08-17), 2, "b",
datetime(2020-08-17), 2, "c",
datetime(2020-08-24), 4, "b",
datetime(2020-08-24), 2, "a",
datetime(2020-08-31), 3, "c"];
let _dims = Test | distinct Dim1;
let _fullRange = range Week from _start to _end step 7d
| extend _start = max_of(-3, -((Week-_start)/7d))
| extend _range = range((_start), (_start+3), 1) | mv-expand _range to typeof(int) | project Week, _origin = Week + _range*7d | extend K=1, Value=0.0 ;
let _fullRangeDims = _dims | extend K=1 | join kind=inner (_fullRange) on K | project-away K;
_fullRangeDims
| join kind=fullouter
(Test
| order by Dim1 asc, Week asc
| extend _bin = bin_at(Week, 7d, _start)
| extend _endRange = iif(_bin + _period > _end, _end,
iff( _bin + _period - 7d < _start, _start,
iff( _bin + _period - 7d < _bin, _bin, _bin + _period - 7d)))
| extend _range = range(_bin, _endRange, 7d)
| mv-expand _range to typeof(datetime)
| project Week=_range, Dim1, Value, _origin = Week) on Week, _origin, Dim1
| project Week=coalesce(Week1, Week), Dim1=coalesce(Dim11, Dim1), Value=coalesce(Value1, Value), _origin= coalesce(_origin1, _origin)
| summarize avg(Value) by Week, Dim1
| order by Week asc, Dim1 asc
|Week|Dim1|avg_Value|
|---|---|---|
|2020-08-03 00:00:00.0000000|a|0.25|
|2020-08-03 00:00:00.0000000|b|0.5|
|2020-08-03 00:00:00.0000000|c|0|
|2020-08-10 00:00:00.0000000|a|0.5|
|2020-08-10 00:00:00.0000000|b|0.75|
|2020-08-10 00:00:00.0000000|c|0|
|2020-08-17 00:00:00.0000000|a|0.5|
|2020-08-17 00:00:00.0000000|b|1.25|
|2020-08-17 00:00:00.0000000|c|0.5|
|2020-08-24 00:00:00.0000000|a|1|
|2020-08-24 00:00:00.0000000|b|2.25|
|2020-08-24 00:00:00.0000000|c|0.5|
|2020-08-31 00:00:00.0000000|a|0.75|
|2020-08-31 00:00:00.0000000|b|1.75|
|2020-08-31 00:00:00.0000000|c|1.25|