MariaDB ColumnStore:按计算值过滤子查询
MariaDB ColumnStore: Filtering SubQuery by Computed Value
我有一个 table,对于给定的 (time, country, asn)
元组,它存储了几个指标:
+----------+---------+-------+-------+--------+--------+
| time | country | asn | rtt | rexb | reqs |
+----------+---------+-------+-------+--------+--------+
| 10000000 | US | 12345 | 40 | 0.05 | 5000 |
| 10000000 | US | 54321 | 120 | 0.15 | 500 |
| 10000000 | MX | 12345 | 300 | 0.25 | 1000 |
| 10000000 | MX | 54321 | 160 | 0.10 | 200 |
| .... | ... | ... | ... | .... | .... |
在一般使用过程中,我将这些指标中的每一个标准化为 0 到 100 之间的值,然后 return 最大值,以便粗略估计 "how good" 与该 ASN 的连接在那个国家:
SELECT
country,
asn,
least(
-- least(100, greatest(0, ...)) = clip value between 0 and 100
least(100, greatest(0,
-- normalize and protect against null values
-- sample normalization:
-- 0 ms RTT = "100% good"
-- 300 ms RTT = "0% good"
coalesce((300 - rtt) / 3, 0)
)),
least(100, greatest(0,
-- sample normalization:
-- 0% REXB = "100% good"
-- 50% REXB = "0% good"
coalesce((0.5 - rexb) / 0.5, 0)
)),
-- Other metrics may follow
) as quality
FROM
metrics
WHERE
time = 10000000 -- "current time"
有时我可能会使用reqs
(对该国家/地区的请求数+ASN)列进行加权来执行加权平均:
SELECT
country,
least(
least(100, greatest(0,
coalesce((300 - sum(rtt*reqs)/sum(reqs)) / 3, 0)
)),
least(100, greatest(0,
coalesce((0.5 - sum(rexb*reqs)/sum(reqs)) / 0.5, 0)
))
) as avg_quality
FROM
metrics
WHERE
time = 10000000 -- "current time"
GROUP BY
country
这个查询非常有效。然而,当我试图在子查询中使用它时,我 运行 遇到了一个问题。
我的目标是找出有多少国家/地区 "average quality" 低于特定阈值:
SELECT
count(*)
FROM (
SELECT
country,
least(
least(100, greatest(0,
coalesce((300 - sum(rtt*reqs)/sum(reqs)) / 3, 0)
)),
least(100, greatest(0,
coalesce((0.5 - sum(rexb*reqs)/sum(reqs)) / 0.5, 0)
))
) as avg_quality
FROM
metrics
WHERE
time = 10000000 -- "current time"
GROUP BY
time, country
) t1
WHERE t1.avg_quality < 50
这引发了错误:
ERROR 1815 (HY000): Internal error: Lost connection to ExeMgr. Please contact your administrator
我可以毫无问题地执行更简单的子查询。为什么这个会失败,我该如何解决?
我正在使用 MariaDB,metrics
table 使用 ColumnStore 引擎。
快速更新
当我用 WHERE country = "US"
替换 WHERE t1.avg_quality < 50
时,查询执行没有问题。所以它可以毫不费力地执行子查询或过滤。严格来说,当我尝试过滤计算列时失败了。
我联系了我公司的 DBA,看看他们是否有任何答案或建议。他们无法为这种行为提供解释,但他们能够提供 work-around:
SELECT
count(*)
FROM (
SELECT
country,
least(
least(100, greatest(0,
coalesce((300 - sum(rtt*reqs)/sum(reqs)) / 3, 0)
)),
least(100, greatest(0,
coalesce((0.5 - sum(rexb*reqs)/sum(reqs)) / 0.5, 0)
))
) as avg_quality
FROM
metrics
WHERE
time = 10000000 -- "current time"
GROUP BY
time, country
HAVING
avg_quality < 50
) t1
我有一个 table,对于给定的 (time, country, asn)
元组,它存储了几个指标:
+----------+---------+-------+-------+--------+--------+
| time | country | asn | rtt | rexb | reqs |
+----------+---------+-------+-------+--------+--------+
| 10000000 | US | 12345 | 40 | 0.05 | 5000 |
| 10000000 | US | 54321 | 120 | 0.15 | 500 |
| 10000000 | MX | 12345 | 300 | 0.25 | 1000 |
| 10000000 | MX | 54321 | 160 | 0.10 | 200 |
| .... | ... | ... | ... | .... | .... |
在一般使用过程中,我将这些指标中的每一个标准化为 0 到 100 之间的值,然后 return 最大值,以便粗略估计 "how good" 与该 ASN 的连接在那个国家:
SELECT
country,
asn,
least(
-- least(100, greatest(0, ...)) = clip value between 0 and 100
least(100, greatest(0,
-- normalize and protect against null values
-- sample normalization:
-- 0 ms RTT = "100% good"
-- 300 ms RTT = "0% good"
coalesce((300 - rtt) / 3, 0)
)),
least(100, greatest(0,
-- sample normalization:
-- 0% REXB = "100% good"
-- 50% REXB = "0% good"
coalesce((0.5 - rexb) / 0.5, 0)
)),
-- Other metrics may follow
) as quality
FROM
metrics
WHERE
time = 10000000 -- "current time"
有时我可能会使用reqs
(对该国家/地区的请求数+ASN)列进行加权来执行加权平均:
SELECT
country,
least(
least(100, greatest(0,
coalesce((300 - sum(rtt*reqs)/sum(reqs)) / 3, 0)
)),
least(100, greatest(0,
coalesce((0.5 - sum(rexb*reqs)/sum(reqs)) / 0.5, 0)
))
) as avg_quality
FROM
metrics
WHERE
time = 10000000 -- "current time"
GROUP BY
country
这个查询非常有效。然而,当我试图在子查询中使用它时,我 运行 遇到了一个问题。
我的目标是找出有多少国家/地区 "average quality" 低于特定阈值:
SELECT
count(*)
FROM (
SELECT
country,
least(
least(100, greatest(0,
coalesce((300 - sum(rtt*reqs)/sum(reqs)) / 3, 0)
)),
least(100, greatest(0,
coalesce((0.5 - sum(rexb*reqs)/sum(reqs)) / 0.5, 0)
))
) as avg_quality
FROM
metrics
WHERE
time = 10000000 -- "current time"
GROUP BY
time, country
) t1
WHERE t1.avg_quality < 50
这引发了错误:
ERROR 1815 (HY000): Internal error: Lost connection to ExeMgr. Please contact your administrator
我可以毫无问题地执行更简单的子查询。为什么这个会失败,我该如何解决?
我正在使用 MariaDB,metrics
table 使用 ColumnStore 引擎。
快速更新
当我用 WHERE country = "US"
替换 WHERE t1.avg_quality < 50
时,查询执行没有问题。所以它可以毫不费力地执行子查询或过滤。严格来说,当我尝试过滤计算列时失败了。
我联系了我公司的 DBA,看看他们是否有任何答案或建议。他们无法为这种行为提供解释,但他们能够提供 work-around:
SELECT
count(*)
FROM (
SELECT
country,
least(
least(100, greatest(0,
coalesce((300 - sum(rtt*reqs)/sum(reqs)) / 3, 0)
)),
least(100, greatest(0,
coalesce((0.5 - sum(rexb*reqs)/sum(reqs)) / 0.5, 0)
))
) as avg_quality
FROM
metrics
WHERE
time = 10000000 -- "current time"
GROUP BY
time, country
HAVING
avg_quality < 50
) t1