根据一列的子串合并类似的行?

Combine like rows based on substring of one column?

我有一个 table 部分看起来像这样:

Part Part Num Thing1 Thing2 Thing3 Thing4
Door 10105322 abc abc
Door 10105323 abc abc
Door 10105324 abc abc
Door 84625111 abc abc abc
Door 84625118 abc abc abc
Door 84625185 abc abc abc
Door 56897101 abc abc

部件号始终为 8 个字符。对于许多部分,前 6 个字符相同,后 2 个字符不同。零件号的前6个字符相同的行和Thing1/Thing2/Thing3/Thing4中具有相同值的所有行需要合并,零件号变为6个字符。 (上面第 1/2/3 行 table)

前 6 个字符相同但 Thing1/Thing2/Thing3/Thing4 中的值在所有行中不相同的行需要保持不变,部件号保持 8 个字符。 (以上第 4/5/6 行 table)

前6个字符唯一的行需要保持不变,部件号保持8个字符。 (上面第7行table)

所需的结果如下所示:

Part Part Num Thing1 Thing2 Thing3 Thing4
Door 101053 abc abc
Door 84625111 abc abc abc
Door 84625118 abc abc abc
Door 84625185 abc abc abc
Door 56897101 abc abc

您可以使用 window 函数来确定应该合并的内容。我想我可以将所有内容合并为一个比较:

select (case when min_thingee = max_thingee and cnt > 1
             then left(partnum, 6) else partnum
        end) as partnum,
       min(thing1) as thing1, min(thing2) as thing2,
       min(thing3) as thing3, min(thing4) as thing4
from (select t.*,
             min(concat(thing1, '|', thing2, '|', thing3, '|', thing4) over (partition by left(partnum, 6)) as min_thingee,
             max(concat(thing1, '|', thing2, '|', thing3, '|', thing4) over (partition by left(partnum, 6)) as max_thingee,
             count(*) over (partition by left(partnum, 6)) as cnt
      from t
     ) t
group by (case when min_thingee = max_thingee and cnt > 1
               then left(partnum, 6) else partnum
          end);

使用COUNT()window函数:

WITH cte AS (
  SELECT *,
         COUNT(*) OVER (PARTITION BY Part, LEFT(PartNum, 6), Thing1, Thing2, Thing3, Thing4) counter1,
         COUNT(*) OVER (PARTITION BY Part, LEFT(PartNum, 6)) counter2
  FROM tablename
)
SELECT DISTINCT
  Part,
  CASE WHEN counter1 > 1 AND counter1 = counter2 THEN LEFT(PartNum, 6) ELSE PartNum END PartNum,
  Thing1, Thing2, Thing3, Thing4 
FROM cte;

参见demo

如果你真的想使用dense_rank,这里有一个方法。

基本统计数据告诉我们,一组相等数字的标准差等于 0。这意味着一旦我们获得每个 left(partnum,6) 的排名,我们就可以强制执行条件,以便我们只折叠那些行组--只有一个唯一的排名并且至少有两行(stdev 单个值导致 null 其中 <> 0)。注意 partition by 子句以查看排名是如何计算的

with cte as

(select *, dense_rank() over (order by part, left(partnum,6), thing1, thing2, thing3, thing4) as rnk
 from my_table)

select distinct 
       part,
       case when stdev(rnk) over (partition by part, left(partnum,6)) = 0 then left(partnum,6) else partnum end as partnum,
       thing1,
       thing2,
       thing3,
       thing4
from cte;