优化 SQL 视图以避免笛卡尔积

Optimizing SQL View to Avoid Cartesian Product

我有一组测试脚本(大约 4,000 个独特的脚本)定期 运行 并将它们的结果写入 mariadb 数据库中的 table (script_history)。 table 当前有 60,000 行。为了查看4000个脚本的最新记录,我有一个视图写成:

SELECT
   t1.pk ASK pk,
   t1.script_name AS script_name, 
   t1.test_points_passed AS test_points_passed,
   t1.test_points_failed AS test_points_failed,
   t1.execution_time AS execution_time,
   t1.tester_name AS tester_name,
   t1.execution_date as execution_date,
   t1.test_notes AS test_notes,
   t1.script_in_execution AS script_in_execution,
   t1.hostname AS hostname
FROM
   (script_db.script_history t1 LEFT JOIN script_db.script_history t2 ON
    (t2.script_name = t1.script_name and t2.execution_date > t1.execution_date))
WHERE t2.execution_date IS NULL group by t1.script_name

这提供了 4,000 个脚本中每个脚本的最新 运行 记录。不幸的是,它生成的笛卡尔积在尝试加载视图时会严重降低性能(加载将近 5 分钟)。

我还尝试了以下视图:

SELECT
   script_history.*
FROM
   (SELECT
      pk, script_name, test_points_passed, test_points_failed, execution_time, tester_name, MAX(execution_date)
      as execution_date, test_notes, script_in_execution, hostname
   FROM script_history
   GROUP BY script_name) AS A
INNER JOIN
   script_history
   ON
     script_history.script_name = A.script_name AND
     script_history.execution_date = A.execution_date;

此视图定义的加载速度非常快,不幸的是它似乎没有产生预期的结果。它没有引入 4000 个独特脚本中每个脚本的最后 运行 数据,而是引入了重复项(大约 400 条记录),其中同一脚本在同一天 运行,在该视图中产生了大约 4,400 条记录。任何有助于查看一组脚本的最后一次执行的行数据的帮助将不胜感激。

示例数据: (pk, script_name, 测试点通过, 测试点失败, 执行时间, 测试人员、执行日期、测试说明、执行中的脚本、主机名)

1    script1     5    7    10:30   j_doe     2021-05-01    NULL    0    main_server
2    script1     8    4    10:29   j_doe     2021-05-03    NUll    0    backup_server
3    script2    44    0    2:40    j_doe     2021-05-04    NULL    0    backup_server
4    script3     3    2    1:39    j_doe     2021-05-05    NULL    0    main_server
5    script2    43    1    2:40    j_doe     2021-05-05    NULL    0    main_server
6    script3     5    0    1:38    j_doe     2021-06-01    NULL    0    backup_server
7    script4    15    0    0:50    j_doe     2021-07-05    NULL    0    main_server
8    script4    15    0    0:50    j_doe     2021-07-05    NULL    0    main_server

想要的结果:

2    script1     8    4    10:29   j_doe     2021-05-03    NUll    0    backup_server
5    script2    43    1    2:40    j_doe     2021-05-05    NULL    0    main_server
6    script3     5    0    1:38    j_doe     2021-06-01    NULL    0    backup_server
8    script4    15    0    0:50    j_doe     2021-07-05    NULL    0    main_server

我认为代码应该是这样的,这里我们获取最大执行日期,如果有重复,我们获取最大pk。

SELECT ScriptHistory.*
FROM ScriptHistory

INNER JOIN (
  SELECT ScriptHistory.Script_name, ScriptHistory.execution_date, MAX(pk) AS MaxPK
  FROM ScriptHistory

  INNER JOIN (
    SELECT Script_name, Max(execution_date) AS MaxDate
    FROM ScriptHistory
    GROUP BY Script_name
  ) AS A on A.Script_name = ScriptHistory.Script_name
  AND A.MaxDate = ScriptHistory.execution_date

  GROUP BY ScriptHistory.Script_name, ScriptHistory.execution_date

) AS B on B.Script_name = ScriptHistory.Script_name
AND B.execution_date = ScriptHistory.execution_date
AND B.MaxPK = ScriptHistory.pk