递归查询以查找先前的相关行满足条件

Recursive query to find previous related row meeting criteria

我有一个数据库,里面装满了来自各种聊天机器人的消息。聊天机器人都 遵循决策树格式,最终是带有选择的问题 用户响应。

机器人可能会发送一条消息(你好,你想要 A 还是 B?),其中有选项 附上,例如A和B。用户回复 B。这两条消息都是 已记录并附上之前的消息 ID。

id message options previous_id
1 Hello would you like A or B? A,B
2 A 1

这些对话的结构并不固定。可能有多种形式 的消息流。以上是消息的简单示例 链接在一起。例如

// text question on same message as options, with preceding unrelated messages
Hello -> My name is Ben the bot. -> How are you today? (good, bad) -> [good]

// text question not on same message as options
Pick your favourite colour -> {picture of blue and red} (blue, red) -> [blue]

// no question just option prompt - here precending text wasn't a question 
[red] -> (ferrari, lamborghini) -> [ferrari]

-> denotes separation of messages [] denotes reply to bot from user () denotes options attached to messages {} denotes attachments

我想从这个数据中得到的是每个问题的一行 相应的答案。我面临的问题是(可能的)递归 每次都必须使用检索上一条消息,直到它符合条件 表明它已经足够回溯到链中的特定答案 消息。

理论上我想要实现的是

  1. 找到所有问题的答案
  2. 从这些结果中查看上一条消息 2a.如果之前的消息有文本并且本身不是答案,则使用所述文本并停止递归 2b.否则移动到下一条上一条消息,直到满足条件。
  3. Return 行包含 answer/response,以及问题行中的问题和其他列(例如 id、时间戳)

这会给我留下很多包含消息和响应的行

例如,在以下数据集中,

id message other previous_id
1 Hello would you like A or B?
2 B 1
3 Hello would you like A or B?
4 A 3
5 Hello would you like A or B?
6 B 5
7 A is a great answer. C or D? 4
8 D 7
9 Green or red?
10 image 9
11 Red 10

我希望以

结束
id message response
1 Hello would you like A or B? B
3 Hello would you like A or B? A
5 Hello would you like A or B? B
7 A is a great answer. C or D? D
8 Green or red? Red

我制作了一些样本数据的(稍微)简化版本,它位于 reference/use 的这个问题的底部。

它使用以下结构

WITH data ( id, message, node, options, previous, attachment) AS ()

可以通过 select where node is null 找到答案,所以我认为这是 最好的起点,我可以向后努力解决这个问题。 previous options 是 json 列,因为它们在真实数据中就是这样 我让他们保持原样。

我尝试了各种方法来获取我想要的数据,但我还没有达到 recursion/unknown 级别位的数量。

例如,这个尝试可以深入挖掘两层,但我无法合并 我找到的消息的 ID,因为显然两者都具有非空值。

select COALESCE(d2.message, d3.message) as question, d.message as answer
--  select COALESCE(d2.message, d2.attachment, d3.message, d3.attachment) as question, d.message as answer
    from data as d
    left join data as d2 on (d.previous->>'id')::int = d2.id
    left join data as d3 on (d2.previous->>'id')::int = d3.id
    where d.previous->>'node' in (
        SELECT node from data where options is not null group by node
    )

我相信这个答案 https://dba.stackexchange.com/a/215125/4660 可能是 我需要的路径,但到目前为止,我无法按照我的意愿将其设置为 运行 。 我认为这将允许我替换上面示例中的两个左连接 说一个递归联合,我可以在 on 子句上使用条件来停止 它在正确的点。希望这听起来像是正确的 行,有人可以指出我正确的方向。像下面这样的东西 也许?

WITH data (
    id,
    message,
    node,
    options,
    previous,
    attachment
) AS (
    VALUES ...
), RecursiveTable as (
    select * from data d where node is null # all answers?
    union all
    select * from RecursiveTable where ??
)
select * from RecursiveTable

--

基本样本数据集

WITH data (
    id,
    message,
    node,
    options,
    previous,
    attachment
) AS (
    VALUES
      -- QUESTION TYPE 1
      -- pineapple questions
      (1, 'Pineapple on pizza?', 'pineapple', '["Yes","No"]'::json, null::json, null),
      (2, 'Pineapple on pizza?', 'pineapple', '["Yes","No"]'::json, null::json, null),
      (3, 'Pineapple on pizza?', 'pineapple', '["Yes","No"]'::json, null::json, null),
      (4, 'Pineapple on pizza?', 'pineapple', '["Yes","No"]'::json, null::json, null),
      (5, 'Pineapple on pizza?', 'pineapple', '["Yes","No"]'::json, null::json, null),
      -- pineapple answers
      (6, 'No', null, null, '{"id": 1, "node": "pineapple"}'::json, null),
      (7, 'Yes', null, null, '{"id": 2, "node": "pineapple"}'::json, null),
      (8, 'No', null, null, '{"id": 3, "node": "pineapple"}'::json, null),
      (9, 'Yes', null, null, '{"id": 4, "node": "pineapple"}'::json, null),
      (10, 'No', null, null, '{"id": 5, "node": "pineapple"}'::json, null),

      -- ----------------------------

      -- QUESTION TYPE 2 - Previous message, then question with text + options followed by answer
      --- previous messages to stuffed crust questions (we don't care about
        --these but they're here to ensure we aren't accidentally getting them
        --as the question in results)
      (11, 'Hello', 'hello_pre_stuffed_crust', null, null::json, null),
      (12, 'Hello', 'hello_pre_stuffed_crust', null, null::json, null),
      (13, 'Hello', 'hello_pre_stuffed_crust', null, null::json, null),
      -- stuffed crust questions
      (14, 'Stuffed crust?', 'stuffed_crust', '["Crunchy crust","More cheese!"]'::json, '{"id": 11, "node": "hello_pre_stuffed_crust"}'::json, null),
      (15, 'Stuffed crust?', 'stuffed_crust', '["Crunchy crust","More cheese!"]'::json, '{"id": 12, "node": "hello_pre_stuffed_crust"}'::json, null),
      (16, 'Stuffed crust?', 'stuffed_crust', '["Crunchy crust","More cheese!"]'::json, '{"id": 13, "node": "hello_pre_stuffed_crust"}'::json, null),
      -- stuffed crust answers
      (17, 'More cheese!', null, null, '{"id": 14, "node": "stuffed_crust"}'::json, null),
      (18, 'Crunchy crust', null, null, '{"id": 15, "node": "stuffed_crust"}'::json, null),
      (19, 'Crunchy crust', null, null, '{"id": 16, "node": "stuffed_crust"}'::json, null),

      -- ----------------------------

      -- QUESTION TYPE 3
      -- two part question, no text with options only image, should get text from previous
      -- part 1
      (20, 'What do you think of this pizza?', 'check_this_image', null, null::json, null),
      (21, 'What do you think of this pizza?', 'check_this_image', null, null::json, null),
      (22, 'What do you think of this pizza?', 'check_this_image', null, null::json, null),
      -- part two
      (23, null, 'image', '["Looks amazing!","Not my cup of tea"]'::json, '{"id": 20, "node": "check_this_image"}'::json, 'https://images.unsplash.com/photo-1544982503-9f984c14501a'),
      (24, null, 'image', '["Looks amazing!","Not my cup of tea"]'::json, '{"id": 21, "node": "check_this_image"}'::json, 'https://images.unsplash.com/photo-1544982503-9f984c14501a'),
      (25, null, 'image', '["Looks amazing!","Not my cup of tea"]'::json, '{"id": 22, "node": "check_this_image"}'::json, 'https://images.unsplash.com/photo-1544982503-9f984c14501a'),
      -- two part answers
      (26, 'Looks amazing!', null, null, '{"id": 23, "node": "image"}'::json, null),
      (27, 'Not my cup of tea', null, null, '{"id": 24, "node": "image"}'::json, null),
      (28, 'Looks amazing!', null, null, '{"id": 25, "node": "image"}'::json, null),

      -- ----------------------------

      -- QUESTION TYPE 4
      -- no text, just options straight after responding to something else - options for text value would be options, or image
      -- directly after question 3 was answered, previous message was user message - but we don't have text here - just an image and options
      (29, null, 'which_brand', '["Dominos","Papa Johns"]'::json, '{"id": 27}'::json, 'https://peakstudentmediadotcom.files.wordpress.com/2018/11/vs.jpg'),
      (30, null, 'which_brand', '["Dominos","Papa Johns"]'::json, '{"id": 28}'::json, 'https://peakstudentmediadotcom.files.wordpress.com/2018/11/vs.jpg'),
      (31, null, 'which_brand', '["Dominos","Papa Johns"]'::json, '{"id": 29}'::json, 'https://peakstudentmediadotcom.files.wordpress.com/2018/11/vs.jpg')
)
SELECT * from data

您可以使用 WIT HRECURSIVE 来实现您的目标。您只需要指定何时停止递归并找到一种方法来 select 只有那些递归没有产生任何额外行的记录。

看这里:

WITH RECURSIVE comp (
    id, message, node, options, previous, attachment,
    id2, message2, node2, options2, previous2, attachment2,
    rec_depth
) AS (
    SELECT
        t.id, t.message, t.node, t.options, t.previous, t.attachment,
        null::integer AS id2, null::text AS message2, null::text AS node2, null::json AS options2, null::json AS previous2, null::text AS attachment2,
        0
    FROM data t
    WHERE t.node IS NULL
UNION ALL
    SELECT
        c.id, c.message, c.node, c.options, c.previous, c.attachment,
        prev.id, prev.message, prev.node, prev.options, prev.previous, prev.attachment,
        c.rec_depth + 1
    FROM comp c
    INNER JOIN data prev ON prev.id = ((COALESCE(c.previous2, c.previous))->>'id')::int
    WHERE prev.node IS NOT NULL -- do not reach back to the next answer
        AND c.message2 IS NULL -- do not reach back beyond a message with text (the question text)
), data (id, message, node, options, previous, attachment) AS (
    VALUES [...]
) SELECT
    c.id2 AS question_id, c.id AS answer_id
FROM comp c
WHERE
    NOT EXISTS(
        SELECT 1
        FROM comp c2
        WHERE c2.id = c.id
        AND c2.rec_depth > c.rec_depth
    )

comp 在递归之前只保留“答案”(这是 UNION ALL 上面的部分)。然后,在第一个递归步骤中,将它们与前身相结合。在第二步中,为每个答案-前置任务对创建另一个新记录,其中前置任务用其前置任务替换自己。这样做,直到达到“基本条件”(加入的合作伙伴是带有 message 又名问题文本的记录,或者下一个合作伙伴是没有 node 又名答案的记录)(这意味着直到没有创建新记录)。

因为我们还计算了每一行的递归深度 (rec_depth),所以我们最终可以检查我们是否仅使用每个答案生成的具有最大递归深度的记录。

第二个 WITH 语句当然可以而且应该删除,您应该在 WITH RECURSIVE 部分引用您的真实 table。

我选择仅 select 答案和相应问题的 ID,但是 WITH RECURSIVE 已经以某种方式构建,您可以使用所有列。

进一步阅读文档: