计算每一行的空值加上其他条件

Calculate nulls for each row plus other condition

正在 SQL 服务器中工作。 实际上,我有很多列(20 多个)、数千行和多种数据类型的数据。我没有创建函数或过程的特权,但我可以处理临时表。这是数据的简化片段。

drop table if exists #test
create table #test
(
Id INT,
BusinessName varchar(30),
Address1 varchar(100),
Address2 varchar (100),
Address3 varchar (100),
Postcode varchar (100),
City varchar (100),
Country varchar (20),
Turnover dec(20,2),
domain varchar (100)
)
insert #test values
(1,'A','nr1','street1', 'Court1', null, null, 'GB', 1.1,'www@1'),
(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(1,'A', null, null, 'Court1', null, 'Paris', 'FR', 1.3, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2'),
(2,'B','nr2a',null, null, 'M2 3DW', 'Manchester', 'GB', 2, 'www@2')

对于每个 ID,我需要选择空值最少的记录。

如果空值的最少数量相同,那么我需要选择邮政编码不为空的记录,

如果还有多选,我要选一个city不为null的,

如果还有多项选择,我需要选择所有三个特征(地址1、地址2、地址3)中空值最少的一个

否则我可以选择top1。

答案应该是:

(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2')

这是我对每一行的空值进行计数的尝试,但首先地址空值没有正确地加在一起。另外,我不知道现在如何根据优先级进行选择:

Drop table if exists #solution 
select s.*
,case
    when Address1 Is NULL   
    OR Address1 = ''
    THEN 1
End Add1filled
,case
    when Address2 Is NULL   
    OR Address2 = ''
    THEN 1
End Add2filled
,case
    when Address3 Is NULL   
    OR Address3 = ''
    THEN 1
End Add3filled
,case
    when Postcode Is NULL   
    OR Postcode = ''
    THEN 1
End postcodefilled
,case
    when City Is NULL   
    OR City = ''
    THEN 1
End Cityfilled
into #solution
from #test s order by ID

Select
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
sum(Add1filled)+sum(Add2filled)+sum(Add3filled) [Addfilled],
postcodefilled,
Cityfilled
from #solution
group by
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
postcodefilled,
Cityfilled

有人可以帮忙吗?

这个 有效 但这不是一个明显的解决方案。水平“计数”,使用不同的数据类型,并不简单;您需要对数据进行逆透视并将它们转换为所有相同的数据类型,然后 COUNTNULL 值。这就是 APPLY 正在做的事情。

此外,在 APPLY 中,您需要计算出行的“优先级”,基于 2 行上的 NULL 值具有相同数量的非 NULL 值。因此,“最高”优先级值具有最高值。

然后,最后,我们使用“旧”TOP (1) WITH TIES方法和ROW_NUMBER“过滤”到每组中的“第一”行:

SELECT TOP (1) WITH TIES
       T.*
FROM #test T
    CROSS APPLY (SELECT COUNT(N) AS NotNulls,
                        MAX(P) AS Priority
                 FROM (VALUES(CASE WHEN T.BusinessName IS NOT NULL THEN 1 END,NULL),
                             (CASE WHEN T.Address1 IS NOT NULL THEN 1 END,CASE WHEN T.Address1 IS NOT NULL THEN 1 END),
                             (CASE WHEN T.Address2 IS NOT NULL THEN 1 END,CASE WHEN T.Address2 IS NOT NULL THEN 2 END),
                             (CASE WHEN T.Address3 IS NOT NULL THEN 1 END,CASE WHEN T.Address3 IS NOT NULL THEN 3 END),
                             (CASE WHEN T.Postcode IS NOT NULL THEN 1 END,CASE WHEN T.Postcode IS NOT NULL THEN 4 END),
                             (CASE WHEN T.City IS NOT NULL THEN 1 END,CASE WHEN T.City IS NOT NULL THEN 5 END),
                             (CASE WHEN T.Country IS NOT NULL THEN 1 END,NULL),
                             (CASE WHEN T.Turnover IS NOT NULL THEN 1 END,NULL),
                             (CASE WHEN T.Domain IS NOT NULL THEN 1 END,NULL))V(N,P))G
ORDER BY ROW_NUMBER() OVER (PARTITION BY T.ID ORDER BY G.NotNulls DESC, G.Priority DESC, BusinessName ASC);

使用 CASE WHEN 表达式检查 NULL。使用 row_number() 确定每个 Id.

的优先级
with cte as
(
    select  *,
        rn  = row_number() over 
              (
                partition by Id
                    order by nulls, 
                             (case when Postcode is not null then 1 else 2 end),
                             (case when City is not null then 1 else 2 end)
               )
            
    from    #test t 
            cross apply
            (
            select  nulls   = case when BusinessName is null then 1 else 0 end
                            + case when Address1 is null then 1 else 0 end
                            + case when Address2 is null then 1 else 0 end
                            + case when Address3 is null then 1 else 0 end
                            + case when Postcode is null then 1 else 0 end
                            + case when City is null then 1 else 0 end
                            + case when Country is null then 1 else 0 end
                            + case when Turnover is null then 1 else 0 end
                            + case when domain is null then 1 else 0 end
            ) n
)
select  *
from    cte 
where   rn  = 1

我假设有一个主键。这是一个可以扩展为 n 列的解决方案,无论数据类型如何:

WITH cte1 AS (
    SELECT Pk
         , Id
         , count_all = COUNT(v)
         , count_postcode = COUNT(CASE WHEN n = 'Postcode' AND v IS NOT NULL THEN 1 END)
         , count_city = COUNT(CASE WHEN n = 'City' AND v IS NOT NULL THEN 1 END)
         , count_address = COUNT(CASE WHEN n IN ('Address1', 'Address2', 'Address3') AND v IS NOT NULL THEN 1 END)
    FROM #test
    CROSS APPLY (values
        ('BusinessName', CASE WHEN BusinessName IS NOT NULL THEN 1 END),
        ('Address1',     CASE WHEN Address1     IS NOT NULL THEN 1 END),
        ('Address2',     CASE WHEN Address2     IS NOT NULL THEN 1 END),
        ('Address3',     CASE WHEN Address3     IS NOT NULL THEN 1 END),
        ('Postcode',     CASE WHEN Postcode     IS NOT NULL THEN 1 END),
        ('City',         CASE WHEN City         IS NOT NULL THEN 1 END),
        ('Country',      CASE WHEN Country      IS NOT NULL THEN 1 END),
        ('Turnover',     CASE WHEN Turnover     IS NOT NULL THEN 1 END),
        ('Domain',       CASE WHEN Domain       IS NOT NULL THEN 1 END) 
    ) AS x(n, v)
    GROUP BY Pk, Id
), cte2 AS (
    SELECT cte1.*
         , rn = ROW_NUMBER() OVER (PARTITION BY Id ORDER BY count_all DESC, count_postcode DESC, count_city DESC, count_address DESC)
    FROM cte1
)
SELECT *
FROM cte2
WHERE rn= 1

它 returns 第 2 行(有邮政编码)和第 4 行(两个地址)。

我看不到为此使用 apply 的价值。只需使用 top (1)order by:

select top (1) t.*
from #test
order by (case when BusinessName is not null then 1 else 0 end +
          case when Address1 is not null then 1 else 0 end +
          case when Address2 is not null then 1 else 0 end +
          case when Address3 is not null then 1 else 0 end +
          case when Postcode is not null then 1 else 0 end +
          case when City is not null then 1 else 0 end +
          case when Country is not null then 1 else 0 end +
          case when Turnover is not null then 1 else 0 end +
          case when domain is not null then 1 else 0 end
         ) desc,
         (case when Postcode is not null then 1 else 0 end) desc,
         (case when City is not null then 1 else 0 end) desc,
         (case when Address1 is not null then 1 else 0 end +
          case when Address2 is not null then 1 else 0 end +
          case when Address3 is not null then 1 else 0 end
         ) desc;

所有逻辑都集中在一个地方,不需要ORDER BY之外的额外处理。