删除 sql 中两列 Char 数据类型之间的重复组合

Question

我有两列 Char 数据类型，即使顺序不同，我也想删除其中的重复组合。

对于Example_->

输入数据-

Col_1 Col_2

str_1 str_2
str_2 str_1
str_2 str_3
str_2 str_4
str_3 str_2

输出数据-

Col_1 Col_2

str_1 str_2
str_2 str_3
str_2 str_4

Answer 1

大多数 DBMS 产品支持 greatest() 和 least() 函数，可用于此：

select distinct least(col_1, col_2), greatest(col_1, col_2)
from the_table
order by 1

Online example

Answer 2

您可以在 fiddle 中的 SQL 服务器上找到答案。子查询 A 从两列创建所有可能的组合。然后，子查询B中只保留出现次数最少的组合。左连接在左侧具有最小组合的组合。当原始 table 中的一列是 NULL 时，这意味着相反的组合在您的 table 中，这就是使用 CASE 子句的原因。

SELECT CASE WHEN t1.Col_1 IS NULL THEN B.col_2 ELSE B.Col_1 END AS Final_Col_1,
  CASE WHEN t1.Col_1 IS NULL THEN B.col_1 ELSE B.Col_2 END AS Final_Col_2
FROM (
  SELECT DISTINCT *
  FROM (
        SELECT col_1, col_2
        FROM your_table
        UNION ALL
        SELECT col_2, col_1
        FROM your_table)A
  WHERE col_1< col_2)B
LEFT JOIN your_table AS t1
ON t1.Col_1=B.Col_1 AND t1.Col_2 = B.Col_2

Answer 3

假设唯一的重复是反转，那么最快的方法通常是：

select col1, col2
from t
where col1 < col2 or
      not exists (select 1
                  from t t2
                  where t.col1 = t2.col2 and t.col2 = t1.col1
                 );

这可以利用 (col1, col2) 或 (col2, col1) 上的索引。

通过避免任何聚合并使用索引，这应该是几乎数据库上最快的方法。

删除 sql 中两列 Char 数据类型之间的重复组合

Remove duplicate combinations between two columns of Char data type in sql

sql

data-cleaning