根据一列的子串合并类似的行?
Combine like rows based on substring of one column?
我有一个 table 部分看起来像这样:
Part
Part Num
Thing1
Thing2
Thing3
Thing4
Door
10105322
abc
abc
Door
10105323
abc
abc
Door
10105324
abc
abc
Door
84625111
abc
abc
abc
Door
84625118
abc
abc
abc
Door
84625185
abc
abc
abc
Door
56897101
abc
abc
部件号始终为 8 个字符。对于许多部分,前 6 个字符相同,后 2 个字符不同。零件号的前6个字符相同的行和Thing1/Thing2/Thing3/Thing4中具有相同值的所有行需要合并,零件号变为6个字符。 (上面第 1/2/3 行 table)
前 6 个字符相同但 Thing1/Thing2/Thing3/Thing4 中的值在所有行中不相同的行需要保持不变,部件号保持 8 个字符。 (以上第 4/5/6 行 table)
前6个字符唯一的行需要保持不变,部件号保持8个字符。 (上面第7行table)
所需的结果如下所示:
Part
Part Num
Thing1
Thing2
Thing3
Thing4
Door
101053
abc
abc
Door
84625111
abc
abc
abc
Door
84625118
abc
abc
abc
Door
84625185
abc
abc
abc
Door
56897101
abc
abc
您可以使用 window 函数来确定应该合并的内容。我想我可以将所有内容合并为一个比较:
select (case when min_thingee = max_thingee and cnt > 1
then left(partnum, 6) else partnum
end) as partnum,
min(thing1) as thing1, min(thing2) as thing2,
min(thing3) as thing3, min(thing4) as thing4
from (select t.*,
min(concat(thing1, '|', thing2, '|', thing3, '|', thing4) over (partition by left(partnum, 6)) as min_thingee,
max(concat(thing1, '|', thing2, '|', thing3, '|', thing4) over (partition by left(partnum, 6)) as max_thingee,
count(*) over (partition by left(partnum, 6)) as cnt
from t
) t
group by (case when min_thingee = max_thingee and cnt > 1
then left(partnum, 6) else partnum
end);
使用COUNT()
window函数:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY Part, LEFT(PartNum, 6), Thing1, Thing2, Thing3, Thing4) counter1,
COUNT(*) OVER (PARTITION BY Part, LEFT(PartNum, 6)) counter2
FROM tablename
)
SELECT DISTINCT
Part,
CASE WHEN counter1 > 1 AND counter1 = counter2 THEN LEFT(PartNum, 6) ELSE PartNum END PartNum,
Thing1, Thing2, Thing3, Thing4
FROM cte;
参见demo。
如果你真的想使用dense_rank
,这里有一个方法。
基本统计数据告诉我们,一组相等数字的标准差等于 0。这意味着一旦我们获得每个 left(partnum,6)
的排名,我们就可以强制执行条件,以便我们只折叠那些行组--只有一个唯一的排名并且至少有两行(stdev
单个值导致 null
其中 <> 0
)。注意 partition by
子句以查看排名是如何计算的
with cte as
(select *, dense_rank() over (order by part, left(partnum,6), thing1, thing2, thing3, thing4) as rnk
from my_table)
select distinct
part,
case when stdev(rnk) over (partition by part, left(partnum,6)) = 0 then left(partnum,6) else partnum end as partnum,
thing1,
thing2,
thing3,
thing4
from cte;
我有一个 table 部分看起来像这样:
Part | Part Num | Thing1 | Thing2 | Thing3 | Thing4 |
---|---|---|---|---|---|
Door | 10105322 | abc | abc | ||
Door | 10105323 | abc | abc | ||
Door | 10105324 | abc | abc | ||
Door | 84625111 | abc | abc | abc | |
Door | 84625118 | abc | abc | abc | |
Door | 84625185 | abc | abc | abc | |
Door | 56897101 | abc | abc |
部件号始终为 8 个字符。对于许多部分,前 6 个字符相同,后 2 个字符不同。零件号的前6个字符相同的行和Thing1/Thing2/Thing3/Thing4中具有相同值的所有行需要合并,零件号变为6个字符。 (上面第 1/2/3 行 table)
前 6 个字符相同但 Thing1/Thing2/Thing3/Thing4 中的值在所有行中不相同的行需要保持不变,部件号保持 8 个字符。 (以上第 4/5/6 行 table)
前6个字符唯一的行需要保持不变,部件号保持8个字符。 (上面第7行table)
所需的结果如下所示:
Part | Part Num | Thing1 | Thing2 | Thing3 | Thing4 |
---|---|---|---|---|---|
Door | 101053 | abc | abc | ||
Door | 84625111 | abc | abc | abc | |
Door | 84625118 | abc | abc | abc | |
Door | 84625185 | abc | abc | abc | |
Door | 56897101 | abc | abc |
您可以使用 window 函数来确定应该合并的内容。我想我可以将所有内容合并为一个比较:
select (case when min_thingee = max_thingee and cnt > 1
then left(partnum, 6) else partnum
end) as partnum,
min(thing1) as thing1, min(thing2) as thing2,
min(thing3) as thing3, min(thing4) as thing4
from (select t.*,
min(concat(thing1, '|', thing2, '|', thing3, '|', thing4) over (partition by left(partnum, 6)) as min_thingee,
max(concat(thing1, '|', thing2, '|', thing3, '|', thing4) over (partition by left(partnum, 6)) as max_thingee,
count(*) over (partition by left(partnum, 6)) as cnt
from t
) t
group by (case when min_thingee = max_thingee and cnt > 1
then left(partnum, 6) else partnum
end);
使用COUNT()
window函数:
WITH cte AS (
SELECT *,
COUNT(*) OVER (PARTITION BY Part, LEFT(PartNum, 6), Thing1, Thing2, Thing3, Thing4) counter1,
COUNT(*) OVER (PARTITION BY Part, LEFT(PartNum, 6)) counter2
FROM tablename
)
SELECT DISTINCT
Part,
CASE WHEN counter1 > 1 AND counter1 = counter2 THEN LEFT(PartNum, 6) ELSE PartNum END PartNum,
Thing1, Thing2, Thing3, Thing4
FROM cte;
参见demo。
如果你真的想使用dense_rank
,这里有一个方法。
基本统计数据告诉我们,一组相等数字的标准差等于 0。这意味着一旦我们获得每个 left(partnum,6)
的排名,我们就可以强制执行条件,以便我们只折叠那些行组--只有一个唯一的排名并且至少有两行(stdev
单个值导致 null
其中 <> 0
)。注意 partition by
子句以查看排名是如何计算的
with cte as
(select *, dense_rank() over (order by part, left(partnum,6), thing1, thing2, thing3, thing4) as rnk
from my_table)
select distinct
part,
case when stdev(rnk) over (partition by part, left(partnum,6)) = 0 then left(partnum,6) else partnum end as partnum,
thing1,
thing2,
thing3,
thing4
from cte;