优化 SQL 视图以避免笛卡尔积
Optimizing SQL View to Avoid Cartesian Product
我有一组测试脚本(大约 4,000 个独特的脚本)定期 运行 并将它们的结果写入 mariadb 数据库中的 table (script_history)。 table 当前有 60,000 行。为了查看4000个脚本的最新记录,我有一个视图写成:
SELECT
t1.pk ASK pk,
t1.script_name AS script_name,
t1.test_points_passed AS test_points_passed,
t1.test_points_failed AS test_points_failed,
t1.execution_time AS execution_time,
t1.tester_name AS tester_name,
t1.execution_date as execution_date,
t1.test_notes AS test_notes,
t1.script_in_execution AS script_in_execution,
t1.hostname AS hostname
FROM
(script_db.script_history t1 LEFT JOIN script_db.script_history t2 ON
(t2.script_name = t1.script_name and t2.execution_date > t1.execution_date))
WHERE t2.execution_date IS NULL group by t1.script_name
这提供了 4,000 个脚本中每个脚本的最新 运行 记录。不幸的是,它生成的笛卡尔积在尝试加载视图时会严重降低性能(加载将近 5 分钟)。
我还尝试了以下视图:
SELECT
script_history.*
FROM
(SELECT
pk, script_name, test_points_passed, test_points_failed, execution_time, tester_name, MAX(execution_date)
as execution_date, test_notes, script_in_execution, hostname
FROM script_history
GROUP BY script_name) AS A
INNER JOIN
script_history
ON
script_history.script_name = A.script_name AND
script_history.execution_date = A.execution_date;
此视图定义的加载速度非常快,不幸的是它似乎没有产生预期的结果。它没有引入 4000 个独特脚本中每个脚本的最后 运行 数据,而是引入了重复项(大约 400 条记录),其中同一脚本在同一天 运行,在该视图中产生了大约 4,400 条记录。任何有助于查看一组脚本的最后一次执行的行数据的帮助将不胜感激。
示例数据:
(pk, script_name, 测试点通过, 测试点失败, 执行时间,
测试人员、执行日期、测试说明、执行中的脚本、主机名)
1 script1 5 7 10:30 j_doe 2021-05-01 NULL 0 main_server
2 script1 8 4 10:29 j_doe 2021-05-03 NUll 0 backup_server
3 script2 44 0 2:40 j_doe 2021-05-04 NULL 0 backup_server
4 script3 3 2 1:39 j_doe 2021-05-05 NULL 0 main_server
5 script2 43 1 2:40 j_doe 2021-05-05 NULL 0 main_server
6 script3 5 0 1:38 j_doe 2021-06-01 NULL 0 backup_server
7 script4 15 0 0:50 j_doe 2021-07-05 NULL 0 main_server
8 script4 15 0 0:50 j_doe 2021-07-05 NULL 0 main_server
想要的结果:
2 script1 8 4 10:29 j_doe 2021-05-03 NUll 0 backup_server
5 script2 43 1 2:40 j_doe 2021-05-05 NULL 0 main_server
6 script3 5 0 1:38 j_doe 2021-06-01 NULL 0 backup_server
8 script4 15 0 0:50 j_doe 2021-07-05 NULL 0 main_server
我认为代码应该是这样的,这里我们获取最大执行日期,如果有重复,我们获取最大pk。
SELECT ScriptHistory.*
FROM ScriptHistory
INNER JOIN (
SELECT ScriptHistory.Script_name, ScriptHistory.execution_date, MAX(pk) AS MaxPK
FROM ScriptHistory
INNER JOIN (
SELECT Script_name, Max(execution_date) AS MaxDate
FROM ScriptHistory
GROUP BY Script_name
) AS A on A.Script_name = ScriptHistory.Script_name
AND A.MaxDate = ScriptHistory.execution_date
GROUP BY ScriptHistory.Script_name, ScriptHistory.execution_date
) AS B on B.Script_name = ScriptHistory.Script_name
AND B.execution_date = ScriptHistory.execution_date
AND B.MaxPK = ScriptHistory.pk
我有一组测试脚本(大约 4,000 个独特的脚本)定期 运行 并将它们的结果写入 mariadb 数据库中的 table (script_history)。 table 当前有 60,000 行。为了查看4000个脚本的最新记录,我有一个视图写成:
SELECT
t1.pk ASK pk,
t1.script_name AS script_name,
t1.test_points_passed AS test_points_passed,
t1.test_points_failed AS test_points_failed,
t1.execution_time AS execution_time,
t1.tester_name AS tester_name,
t1.execution_date as execution_date,
t1.test_notes AS test_notes,
t1.script_in_execution AS script_in_execution,
t1.hostname AS hostname
FROM
(script_db.script_history t1 LEFT JOIN script_db.script_history t2 ON
(t2.script_name = t1.script_name and t2.execution_date > t1.execution_date))
WHERE t2.execution_date IS NULL group by t1.script_name
这提供了 4,000 个脚本中每个脚本的最新 运行 记录。不幸的是,它生成的笛卡尔积在尝试加载视图时会严重降低性能(加载将近 5 分钟)。
我还尝试了以下视图:
SELECT
script_history.*
FROM
(SELECT
pk, script_name, test_points_passed, test_points_failed, execution_time, tester_name, MAX(execution_date)
as execution_date, test_notes, script_in_execution, hostname
FROM script_history
GROUP BY script_name) AS A
INNER JOIN
script_history
ON
script_history.script_name = A.script_name AND
script_history.execution_date = A.execution_date;
此视图定义的加载速度非常快,不幸的是它似乎没有产生预期的结果。它没有引入 4000 个独特脚本中每个脚本的最后 运行 数据,而是引入了重复项(大约 400 条记录),其中同一脚本在同一天 运行,在该视图中产生了大约 4,400 条记录。任何有助于查看一组脚本的最后一次执行的行数据的帮助将不胜感激。
示例数据: (pk, script_name, 测试点通过, 测试点失败, 执行时间, 测试人员、执行日期、测试说明、执行中的脚本、主机名)
1 script1 5 7 10:30 j_doe 2021-05-01 NULL 0 main_server
2 script1 8 4 10:29 j_doe 2021-05-03 NUll 0 backup_server
3 script2 44 0 2:40 j_doe 2021-05-04 NULL 0 backup_server
4 script3 3 2 1:39 j_doe 2021-05-05 NULL 0 main_server
5 script2 43 1 2:40 j_doe 2021-05-05 NULL 0 main_server
6 script3 5 0 1:38 j_doe 2021-06-01 NULL 0 backup_server
7 script4 15 0 0:50 j_doe 2021-07-05 NULL 0 main_server
8 script4 15 0 0:50 j_doe 2021-07-05 NULL 0 main_server
想要的结果:
2 script1 8 4 10:29 j_doe 2021-05-03 NUll 0 backup_server
5 script2 43 1 2:40 j_doe 2021-05-05 NULL 0 main_server
6 script3 5 0 1:38 j_doe 2021-06-01 NULL 0 backup_server
8 script4 15 0 0:50 j_doe 2021-07-05 NULL 0 main_server
我认为代码应该是这样的,这里我们获取最大执行日期,如果有重复,我们获取最大pk。
SELECT ScriptHistory.*
FROM ScriptHistory
INNER JOIN (
SELECT ScriptHistory.Script_name, ScriptHistory.execution_date, MAX(pk) AS MaxPK
FROM ScriptHistory
INNER JOIN (
SELECT Script_name, Max(execution_date) AS MaxDate
FROM ScriptHistory
GROUP BY Script_name
) AS A on A.Script_name = ScriptHistory.Script_name
AND A.MaxDate = ScriptHistory.execution_date
GROUP BY ScriptHistory.Script_name, ScriptHistory.execution_date
) AS B on B.Script_name = ScriptHistory.Script_name
AND B.execution_date = ScriptHistory.execution_date
AND B.MaxPK = ScriptHistory.pk