PostgreSQL。 INNER JOIN 不排除任何东西并给出整个 table
PostgreSQL. INNER JOIN doesn't exclude anything and gives the whole table
这里是 'tasks' table:
+----+------------------------+-----------+-----------+-----------+----------------+
| id | task | subtask_1 | subtask_2 | subtask_3 | active_subtask |
+----+------------------------+-----------+-----------+-----------+----------------+
| 1 | Earn 1 million dollars | Earn 100k | Earn 500k | Earn 1m | 3 |
+----+------------------------+-----------+-----------+-----------+----------------+
| 2 | Earn 2 million dollars | Earn 1m | Earn 1.5m | Earn 2m | 1 |
+----+------------------------+-----------+-----------+-----------+----------------+
| 3 | Earn 3 million dollars | Earn 2m | Earn 2.5m | Earn 3m | 2 |
+----+------------------------+-----------+-----------+-----------+----------------+
| 4 | ... | ... | ... | ... | ... |
+----+------------------------+-----------+-----------+-----------+----------------+
我有 2 个查询要加入以获取具有非唯一活动子任务的任务列表。
第一个给我每个任务的活动子任务(有一些 'where' 条件):
+----+-----------+
| id | subtask |
+----+-----------+
| 1 | Earn 1m |
+----+-----------+
| 2 | Earn 1m |
+----+-----------+
| 3 | Earn 2.5m |
+----+-----------+
| . | ... |
+----+-----------+
第二个给了我所有非唯一活动子任务的列表(即有超过 1 个任务有这个活动子任务):
+---------+
| subtask |
+---------+
| Earn 1m |
+---------+
| ... |
+---------+
我在这里尝试使用 INNER JOIN 来获取所有具有非唯一活动子任务的任务。
预期结果是:
+----+---------+
| id | subtask |
+----+---------+
| 1 | Earn 1m |
+----+---------+
| 2 | Earn 1m |
+----+---------+
| . | ... |
+----+---------+
但我总能得到第一个查询的完整结果。
简化的 SQL 查询(没有 'where' 子句)如下所示:
SELECT id,
CASE WHEN active_subtask = 1 THEN subtask_1 WHEN active_subtask = 2 THEN subtask_2 WHEN active_subtask = 3 THEN subtask_3 END
AS subtask
FROM "tasks"
INNER JOIN
(SELECT CASE WHEN active_subtask = 1 THEN subtask_1 WHEN active_subtask = 2 THEN subtask_2 WHEN active_subtask = 3 THEN subtask_3 END
AS subtask
FROM "tasks"
GROUP BY subtask
HAVING (COUNT(*) > 1)) AS agg
ON subtask = agg.subtask
我做错了什么?
我建议采用不同的方法:
/* non-unique subtasks */
select
id, task, subtask
from (
select
id, task, subtask, count(*) over(partition by subtask) c
from (
select id , task, subtask_1 as subtask from tasks where subtask_1 is not null union all
select id , task, subtask_2 as subtask from tasks where subtask_2 is not null union all
select id , task, subtask_3 as subtask from tasks where subtask_3 is not null
) fddl
) derived
where c > 1
;
/* unique subtasks */
select
id, task, subtask
from (
select
id, task, subtask, count(*) over(partition by subtask) c
from (
select id , task, subtask_1 as subtask from tasks where subtask_1 is not null union all
select id , task, subtask_2 as subtask from tasks where subtask_2 is not null union all
select id , task, subtask_3 as subtask from tasks where subtask_3 is not null
) fddl
) derived
where c = 1
;
这里的根本问题是你有一个非规范化的数据结构,这限制了你简单查询的能力。在我看来,将子任务保存在单独的 table 中以支持与任务 table 的多对一关系会更好。实际上,这就是上面“有点”模拟的联合子查询。
我将您的问题理解为 “哪些不同的任务具有相同的活动子任务?”如果我没猜错,这是一种方法..
with cte (id, subtask) as
(select id, split_part(concat_ws('|',subtask_1,subtask_2,subtask_3),'|', active_subtask)
from t)
select a.*
from cte a
join cte b on a.subtask = b.subtask and a.id <> b.id;
这个方法有点老套,所以一定要注意 nulls
并且 |
分隔符还不是您数据的一部分。
另一种选择是
with cte (id, subtask) as
(select id,
case when active_subtask=1 then subtask_1
when active_subtask=2 then subtask_2
when active_subtask=3 then subtask_3 end
from tasks)
select *
from cte
where subtask in (select subtask
from cte
group by subtask
having count(distinct id)>1);
这里是 'tasks' table:
+----+------------------------+-----------+-----------+-----------+----------------+
| id | task | subtask_1 | subtask_2 | subtask_3 | active_subtask |
+----+------------------------+-----------+-----------+-----------+----------------+
| 1 | Earn 1 million dollars | Earn 100k | Earn 500k | Earn 1m | 3 |
+----+------------------------+-----------+-----------+-----------+----------------+
| 2 | Earn 2 million dollars | Earn 1m | Earn 1.5m | Earn 2m | 1 |
+----+------------------------+-----------+-----------+-----------+----------------+
| 3 | Earn 3 million dollars | Earn 2m | Earn 2.5m | Earn 3m | 2 |
+----+------------------------+-----------+-----------+-----------+----------------+
| 4 | ... | ... | ... | ... | ... |
+----+------------------------+-----------+-----------+-----------+----------------+
我有 2 个查询要加入以获取具有非唯一活动子任务的任务列表。 第一个给我每个任务的活动子任务(有一些 'where' 条件):
+----+-----------+
| id | subtask |
+----+-----------+
| 1 | Earn 1m |
+----+-----------+
| 2 | Earn 1m |
+----+-----------+
| 3 | Earn 2.5m |
+----+-----------+
| . | ... |
+----+-----------+
第二个给了我所有非唯一活动子任务的列表(即有超过 1 个任务有这个活动子任务):
+---------+
| subtask |
+---------+
| Earn 1m |
+---------+
| ... |
+---------+
我在这里尝试使用 INNER JOIN 来获取所有具有非唯一活动子任务的任务。 预期结果是:
+----+---------+
| id | subtask |
+----+---------+
| 1 | Earn 1m |
+----+---------+
| 2 | Earn 1m |
+----+---------+
| . | ... |
+----+---------+
但我总能得到第一个查询的完整结果。
简化的 SQL 查询(没有 'where' 子句)如下所示:
SELECT id,
CASE WHEN active_subtask = 1 THEN subtask_1 WHEN active_subtask = 2 THEN subtask_2 WHEN active_subtask = 3 THEN subtask_3 END
AS subtask
FROM "tasks"
INNER JOIN
(SELECT CASE WHEN active_subtask = 1 THEN subtask_1 WHEN active_subtask = 2 THEN subtask_2 WHEN active_subtask = 3 THEN subtask_3 END
AS subtask
FROM "tasks"
GROUP BY subtask
HAVING (COUNT(*) > 1)) AS agg
ON subtask = agg.subtask
我做错了什么?
我建议采用不同的方法:
/* non-unique subtasks */
select
id, task, subtask
from (
select
id, task, subtask, count(*) over(partition by subtask) c
from (
select id , task, subtask_1 as subtask from tasks where subtask_1 is not null union all
select id , task, subtask_2 as subtask from tasks where subtask_2 is not null union all
select id , task, subtask_3 as subtask from tasks where subtask_3 is not null
) fddl
) derived
where c > 1
;
/* unique subtasks */
select
id, task, subtask
from (
select
id, task, subtask, count(*) over(partition by subtask) c
from (
select id , task, subtask_1 as subtask from tasks where subtask_1 is not null union all
select id , task, subtask_2 as subtask from tasks where subtask_2 is not null union all
select id , task, subtask_3 as subtask from tasks where subtask_3 is not null
) fddl
) derived
where c = 1
;
这里的根本问题是你有一个非规范化的数据结构,这限制了你简单查询的能力。在我看来,将子任务保存在单独的 table 中以支持与任务 table 的多对一关系会更好。实际上,这就是上面“有点”模拟的联合子查询。
我将您的问题理解为 “哪些不同的任务具有相同的活动子任务?”如果我没猜错,这是一种方法..
with cte (id, subtask) as
(select id, split_part(concat_ws('|',subtask_1,subtask_2,subtask_3),'|', active_subtask)
from t)
select a.*
from cte a
join cte b on a.subtask = b.subtask and a.id <> b.id;
这个方法有点老套,所以一定要注意 nulls
并且 |
分隔符还不是您数据的一部分。
另一种选择是
with cte (id, subtask) as
(select id,
case when active_subtask=1 then subtask_1
when active_subtask=2 then subtask_2
when active_subtask=3 then subtask_3 end
from tasks)
select *
from cte
where subtask in (select subtask
from cte
group by subtask
having count(distinct id)>1);