如何阻止我的 Postgres 递归 CTE 无限循环?

How can I stop my Postgres recusive CTE from indefinitely looping?

背景

我是 运行 CentOS 7 上的 Postgres 11。
由于 S-Man 对 .

的回答,我最近学习了 Postgres 中递归 CTE 的基础知识

问题

在处理一个密切相关的问题(计算捆绑包和组件中销售的零件)并使用此递归 CTE 时,我 运行 遇到了一个查询无限循环且从未完成的问题。

我追踪到 relator table 中存在非虚假 'self-referential' 条目,即 parent_name 和 [= 具有相同值的行14=].

我知道这些是问题的根源,因为当我用测试 tables 和数据重新创建情况时,当这些行存在时会出现不希望的循环行为,而当这些行不存在时会消失UNION(不包括重复的 returned 行)用于 CTE 而不是 UNION ALL 时。

我认为数据模型本身可能需要调整,以便这些 'self-referential' 行不是必需的,但现在,我需要做的是 将此查询设为 return 完成并停止循环所需的数据

我怎样才能达到这个结果?非常感谢所有指导!

表格和测试数据

CREATE TABLE the_schema.names_categories (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    thing_name TEXT NOT NULL, 
    thing_category TEXT NOT NULL
);

CREATE TABLE the_schema.relator (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    parent_name TEXT NOT NULL, 
    child_name TEXT NOT NULL,
    child_quantity INTEGER NOT NULL 
);


/* NOTE: listing_name below is like an alias of a relator.parent_name as it appears in a catalog, 
required to know because it is these listing_names that are reflected by sales.sold_name */

CREATE TABLE the_schema.catalog_listings ( 
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),
    listing_name TEXT NOT NULL, 
    parent_name TEXT NOT NULL
);

CREATE TABLE the_schema.sales (
    id INTEGER NOT NULL PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
    created_at TIMESTAMPTZ DEFAULT now(),    
    sold_name TEXT NOT NULL,
    sold_quantity INTEGER NOT NULL
);

CREATE VIEW the_schema.relationships_with_child_category AS (
    SELECT 
    c.listing_name, 
    r.parent_name,
    r.child_name, 
    r.child_quantity,
    n.thing_category AS child_category
    FROM 
    the_schema.catalog_listings c
    INNER JOIN 
    the_schema.relator r 
    ON c.parent_name = r.parent_name
    INNER JOIN 
    the_schema.names_categories n 
    ON r.child_name = n.thing_name 
);

INSERT INTO the_schema.names_categories (thing_name, thing_category)
VALUES ('parent1', 'bundle'), ('child1', 'assembly'), ('child2', 'assembly'),('subChild1', 'component'), 
('subChild2', 'component'), ('subChild3', 'component');

INSERT INTO the_schema.catalog_listings (listing_name, parent_name)
VALUES ('listing1', 'parent1'), ('parent1', 'child1'), ('parent1','child2'), ('child1', 'child1'), ('child2', 'child2');

INSERT INTO the_schema.catalog_listings (listing_name, parent_name)
VALUES ('parent1', 'child1'), ('parent1','child2');


/* note the two 'self-referential' entries  */
INSERT INTO the_schema.relator (parent_name, child_name, child_quantity)
VALUES ('parent1', 'child1', 1),('child1', 'subChild1', 1), ('child1', 'subChild2', 1)
('parent1', 'child2', 1),('child2', 'subChild1', 1), ('child2', 'subChild3', 1), ('child1', 'child1', 1), ('child2', 'child2', 1);

INSERT INTO the_schema.sales (sold_name, sold_quantity)
VALUES ('parent1', 1), ('parent1', 2), ('listing1', 1);

当前查询,使用所需的 UNION ALL

无限循环
WITH RECURSIVE cte AS (
    SELECT 
        s.sold_name,
        s.sold_quantity,
        r.child_name,
        r.child_quantity,
        r.child_category as category
    FROM 
        the_schema.sales s
    JOIN the_schema.relationships_with_child_category r
    ON s.sold_name = r.listing_name

    UNION ALL
    
    SELECT
        cte.sold_name,
        cte.sold_quantity,
        r.child_name,
        r.child_quantity,
        r.child_category
    FROM cte
    JOIN the_schema.relationships_with_child_category r 
    ON cte.child_name = r.parent_name

)
SELECT
    child_name,
    SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name
;

您可以简单地通过使用 UNION 而不是 UNION ALL 来避免无限递归。

The documentation 描述实现:

  1. Evaluate the non-recursive term. For UNION (but not UNION ALL), discard duplicate rows. Include all remaining rows in the result of the recursive query, and also place them in a temporary working table.

  2. So long as the working table is not empty, repeat these steps:

    1. Evaluate the recursive term, substituting the current contents of the working table for the recursive self-reference. For UNION (but not UNION ALL), discard duplicate rows and rows that duplicate any previous result row. Include all remaining rows in the result of the recursive query, and also place them in a temporary intermediate table.

    2. Replace the contents of the working table with the contents of the intermediate table, then empty the intermediate table.

“去除重复项”应该会导致中间 table 在某些时候为空,从而结束迭代。

catalog_listings table listing_name 和 parent_name 中 child1child2 relator table parent_name 和 child_name 对于 child1child2 也是一样的

这些行正在创建循环递归。

只需从 table 中删除这两行:

delete from catalog_listings where id in (4,5)
delete from relator where id in (7,8)

那么您想要的输出将如下所示:

child_name sum
subChild2 8
subChild3 8
subChild1 16

这是您要找的结果吗?

如果您无法删除行,您可以在下面添加 parent_name<>child_name 条件来避免这些行:

WITH RECURSIVE cte AS (
    SELECT 
        s.sold_name,
        s.sold_quantity,
        r.child_name,
        r.child_quantity,
        r.child_category as category
    FROM 
        the_schema.sales s
    JOIN the_schema.relationships_with_child_category r
    ON s.sold_name = r.listing_name and r.parent_name <>r.child_name

    UNION ALL
    
    SELECT
        cte.sold_name,
        cte.sold_quantity,
        r.child_name,
        r.child_quantity,
        r.child_category
    FROM cte
    JOIN the_schema.relationships_with_child_category r 
    ON cte.child_name = r.parent_name and r.parent_name <>r.child_name

)
SELECT
    child_name,
    SUM(sold_quantity * child_quantity)
FROM cte
WHERE category = 'component'
GROUP BY child_name    ;