计算每一行的空值加上其他条件
Calculate nulls for each row plus other condition
正在 SQL 服务器中工作。
实际上,我有很多列(20 多个)、数千行和多种数据类型的数据。我没有创建函数或过程的特权,但我可以处理临时表。这是数据的简化片段。
drop table if exists #test
create table #test
(
Id INT,
BusinessName varchar(30),
Address1 varchar(100),
Address2 varchar (100),
Address3 varchar (100),
Postcode varchar (100),
City varchar (100),
Country varchar (20),
Turnover dec(20,2),
domain varchar (100)
)
insert #test values
(1,'A','nr1','street1', 'Court1', null, null, 'GB', 1.1,'www@1'),
(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(1,'A', null, null, 'Court1', null, 'Paris', 'FR', 1.3, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2'),
(2,'B','nr2a',null, null, 'M2 3DW', 'Manchester', 'GB', 2, 'www@2')
对于每个 ID,我需要选择空值最少的记录。
如果空值的最少数量相同,那么我需要选择邮政编码不为空的记录,
如果还有多选,我要选一个city不为null的,
如果还有多项选择,我需要选择所有三个特征(地址1、地址2、地址3)中空值最少的一个
否则我可以选择top1。
答案应该是:
(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2')
这是我对每一行的空值进行计数的尝试,但首先地址空值没有正确地加在一起。另外,我不知道现在如何根据优先级进行选择:
Drop table if exists #solution
select s.*
,case
when Address1 Is NULL
OR Address1 = ''
THEN 1
End Add1filled
,case
when Address2 Is NULL
OR Address2 = ''
THEN 1
End Add2filled
,case
when Address3 Is NULL
OR Address3 = ''
THEN 1
End Add3filled
,case
when Postcode Is NULL
OR Postcode = ''
THEN 1
End postcodefilled
,case
when City Is NULL
OR City = ''
THEN 1
End Cityfilled
into #solution
from #test s order by ID
Select
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
sum(Add1filled)+sum(Add2filled)+sum(Add3filled) [Addfilled],
postcodefilled,
Cityfilled
from #solution
group by
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
postcodefilled,
Cityfilled
有人可以帮忙吗?
这个 有效 但这不是一个明显的解决方案。水平“计数”,使用不同的数据类型,并不简单;您需要对数据进行逆透视并将它们转换为所有相同的数据类型,然后 COUNT
非 NULL
值。这就是 APPLY
正在做的事情。
此外,在 APPLY
中,您需要计算出行的“优先级”,基于 2 行上的 NULL
值具有相同数量的非 NULL
值。因此,“最高”优先级值具有最高值。
然后,最后,我们使用“旧”TOP (1) WITH TIES
方法和ROW_NUMBER
“过滤”到每组中的“第一”行:
SELECT TOP (1) WITH TIES
T.*
FROM #test T
CROSS APPLY (SELECT COUNT(N) AS NotNulls,
MAX(P) AS Priority
FROM (VALUES(CASE WHEN T.BusinessName IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Address1 IS NOT NULL THEN 1 END,CASE WHEN T.Address1 IS NOT NULL THEN 1 END),
(CASE WHEN T.Address2 IS NOT NULL THEN 1 END,CASE WHEN T.Address2 IS NOT NULL THEN 2 END),
(CASE WHEN T.Address3 IS NOT NULL THEN 1 END,CASE WHEN T.Address3 IS NOT NULL THEN 3 END),
(CASE WHEN T.Postcode IS NOT NULL THEN 1 END,CASE WHEN T.Postcode IS NOT NULL THEN 4 END),
(CASE WHEN T.City IS NOT NULL THEN 1 END,CASE WHEN T.City IS NOT NULL THEN 5 END),
(CASE WHEN T.Country IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Turnover IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Domain IS NOT NULL THEN 1 END,NULL))V(N,P))G
ORDER BY ROW_NUMBER() OVER (PARTITION BY T.ID ORDER BY G.NotNulls DESC, G.Priority DESC, BusinessName ASC);
使用 CASE WHEN
表达式检查 NULL
。使用 row_number()
确定每个 Id
.
的优先级
with cte as
(
select *,
rn = row_number() over
(
partition by Id
order by nulls,
(case when Postcode is not null then 1 else 2 end),
(case when City is not null then 1 else 2 end)
)
from #test t
cross apply
(
select nulls = case when BusinessName is null then 1 else 0 end
+ case when Address1 is null then 1 else 0 end
+ case when Address2 is null then 1 else 0 end
+ case when Address3 is null then 1 else 0 end
+ case when Postcode is null then 1 else 0 end
+ case when City is null then 1 else 0 end
+ case when Country is null then 1 else 0 end
+ case when Turnover is null then 1 else 0 end
+ case when domain is null then 1 else 0 end
) n
)
select *
from cte
where rn = 1
我假设有一个主键。这是一个可以扩展为 n 列的解决方案,无论数据类型如何:
WITH cte1 AS (
SELECT Pk
, Id
, count_all = COUNT(v)
, count_postcode = COUNT(CASE WHEN n = 'Postcode' AND v IS NOT NULL THEN 1 END)
, count_city = COUNT(CASE WHEN n = 'City' AND v IS NOT NULL THEN 1 END)
, count_address = COUNT(CASE WHEN n IN ('Address1', 'Address2', 'Address3') AND v IS NOT NULL THEN 1 END)
FROM #test
CROSS APPLY (values
('BusinessName', CASE WHEN BusinessName IS NOT NULL THEN 1 END),
('Address1', CASE WHEN Address1 IS NOT NULL THEN 1 END),
('Address2', CASE WHEN Address2 IS NOT NULL THEN 1 END),
('Address3', CASE WHEN Address3 IS NOT NULL THEN 1 END),
('Postcode', CASE WHEN Postcode IS NOT NULL THEN 1 END),
('City', CASE WHEN City IS NOT NULL THEN 1 END),
('Country', CASE WHEN Country IS NOT NULL THEN 1 END),
('Turnover', CASE WHEN Turnover IS NOT NULL THEN 1 END),
('Domain', CASE WHEN Domain IS NOT NULL THEN 1 END)
) AS x(n, v)
GROUP BY Pk, Id
), cte2 AS (
SELECT cte1.*
, rn = ROW_NUMBER() OVER (PARTITION BY Id ORDER BY count_all DESC, count_postcode DESC, count_city DESC, count_address DESC)
FROM cte1
)
SELECT *
FROM cte2
WHERE rn= 1
它 returns 第 2 行(有邮政编码)和第 4 行(两个地址)。
我看不到为此使用 apply
的价值。只需使用 top (1)
和 order by
:
select top (1) t.*
from #test
order by (case when BusinessName is not null then 1 else 0 end +
case when Address1 is not null then 1 else 0 end +
case when Address2 is not null then 1 else 0 end +
case when Address3 is not null then 1 else 0 end +
case when Postcode is not null then 1 else 0 end +
case when City is not null then 1 else 0 end +
case when Country is not null then 1 else 0 end +
case when Turnover is not null then 1 else 0 end +
case when domain is not null then 1 else 0 end
) desc,
(case when Postcode is not null then 1 else 0 end) desc,
(case when City is not null then 1 else 0 end) desc,
(case when Address1 is not null then 1 else 0 end +
case when Address2 is not null then 1 else 0 end +
case when Address3 is not null then 1 else 0 end
) desc;
所有逻辑都集中在一个地方,不需要ORDER BY
之外的额外处理。
正在 SQL 服务器中工作。 实际上,我有很多列(20 多个)、数千行和多种数据类型的数据。我没有创建函数或过程的特权,但我可以处理临时表。这是数据的简化片段。
drop table if exists #test
create table #test
(
Id INT,
BusinessName varchar(30),
Address1 varchar(100),
Address2 varchar (100),
Address3 varchar (100),
Postcode varchar (100),
City varchar (100),
Country varchar (20),
Turnover dec(20,2),
domain varchar (100)
)
insert #test values
(1,'A','nr1','street1', 'Court1', null, null, 'GB', 1.1,'www@1'),
(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(1,'A', null, null, 'Court1', null, 'Paris', 'FR', 1.3, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2'),
(2,'B','nr2a',null, null, 'M2 3DW', 'Manchester', 'GB', 2, 'www@2')
对于每个 ID,我需要选择空值最少的记录。
如果空值的最少数量相同,那么我需要选择邮政编码不为空的记录,
如果还有多选,我要选一个city不为null的,
如果还有多项选择,我需要选择所有三个特征(地址1、地址2、地址3)中空值最少的一个
否则我可以选择top1。
答案应该是:
(1,'A Ltd','nr1a','avenue1', null, '11968', 'Southampton', 'US', null, 'www@1'),
(2,'B','nr2','street2', null, 'M2 3DW', 'Manchester', 'GB', null, 'www@2')
这是我对每一行的空值进行计数的尝试,但首先地址空值没有正确地加在一起。另外,我不知道现在如何根据优先级进行选择:
Drop table if exists #solution
select s.*
,case
when Address1 Is NULL
OR Address1 = ''
THEN 1
End Add1filled
,case
when Address2 Is NULL
OR Address2 = ''
THEN 1
End Add2filled
,case
when Address3 Is NULL
OR Address3 = ''
THEN 1
End Add3filled
,case
when Postcode Is NULL
OR Postcode = ''
THEN 1
End postcodefilled
,case
when City Is NULL
OR City = ''
THEN 1
End Cityfilled
into #solution
from #test s order by ID
Select
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
sum(Add1filled)+sum(Add2filled)+sum(Add3filled) [Addfilled],
postcodefilled,
Cityfilled
from #solution
group by
Id,
businessName,
Address1,
Address2,
Address3,
Postcode,
City,
Country,
Turnover,
domain,
postcodefilled,
Cityfilled
有人可以帮忙吗?
这个 有效 但这不是一个明显的解决方案。水平“计数”,使用不同的数据类型,并不简单;您需要对数据进行逆透视并将它们转换为所有相同的数据类型,然后 COUNT
非 NULL
值。这就是 APPLY
正在做的事情。
此外,在 APPLY
中,您需要计算出行的“优先级”,基于 2 行上的 NULL
值具有相同数量的非 NULL
值。因此,“最高”优先级值具有最高值。
然后,最后,我们使用“旧”TOP (1) WITH TIES
方法和ROW_NUMBER
“过滤”到每组中的“第一”行:
SELECT TOP (1) WITH TIES
T.*
FROM #test T
CROSS APPLY (SELECT COUNT(N) AS NotNulls,
MAX(P) AS Priority
FROM (VALUES(CASE WHEN T.BusinessName IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Address1 IS NOT NULL THEN 1 END,CASE WHEN T.Address1 IS NOT NULL THEN 1 END),
(CASE WHEN T.Address2 IS NOT NULL THEN 1 END,CASE WHEN T.Address2 IS NOT NULL THEN 2 END),
(CASE WHEN T.Address3 IS NOT NULL THEN 1 END,CASE WHEN T.Address3 IS NOT NULL THEN 3 END),
(CASE WHEN T.Postcode IS NOT NULL THEN 1 END,CASE WHEN T.Postcode IS NOT NULL THEN 4 END),
(CASE WHEN T.City IS NOT NULL THEN 1 END,CASE WHEN T.City IS NOT NULL THEN 5 END),
(CASE WHEN T.Country IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Turnover IS NOT NULL THEN 1 END,NULL),
(CASE WHEN T.Domain IS NOT NULL THEN 1 END,NULL))V(N,P))G
ORDER BY ROW_NUMBER() OVER (PARTITION BY T.ID ORDER BY G.NotNulls DESC, G.Priority DESC, BusinessName ASC);
使用 CASE WHEN
表达式检查 NULL
。使用 row_number()
确定每个 Id
.
with cte as
(
select *,
rn = row_number() over
(
partition by Id
order by nulls,
(case when Postcode is not null then 1 else 2 end),
(case when City is not null then 1 else 2 end)
)
from #test t
cross apply
(
select nulls = case when BusinessName is null then 1 else 0 end
+ case when Address1 is null then 1 else 0 end
+ case when Address2 is null then 1 else 0 end
+ case when Address3 is null then 1 else 0 end
+ case when Postcode is null then 1 else 0 end
+ case when City is null then 1 else 0 end
+ case when Country is null then 1 else 0 end
+ case when Turnover is null then 1 else 0 end
+ case when domain is null then 1 else 0 end
) n
)
select *
from cte
where rn = 1
我假设有一个主键。这是一个可以扩展为 n 列的解决方案,无论数据类型如何:
WITH cte1 AS (
SELECT Pk
, Id
, count_all = COUNT(v)
, count_postcode = COUNT(CASE WHEN n = 'Postcode' AND v IS NOT NULL THEN 1 END)
, count_city = COUNT(CASE WHEN n = 'City' AND v IS NOT NULL THEN 1 END)
, count_address = COUNT(CASE WHEN n IN ('Address1', 'Address2', 'Address3') AND v IS NOT NULL THEN 1 END)
FROM #test
CROSS APPLY (values
('BusinessName', CASE WHEN BusinessName IS NOT NULL THEN 1 END),
('Address1', CASE WHEN Address1 IS NOT NULL THEN 1 END),
('Address2', CASE WHEN Address2 IS NOT NULL THEN 1 END),
('Address3', CASE WHEN Address3 IS NOT NULL THEN 1 END),
('Postcode', CASE WHEN Postcode IS NOT NULL THEN 1 END),
('City', CASE WHEN City IS NOT NULL THEN 1 END),
('Country', CASE WHEN Country IS NOT NULL THEN 1 END),
('Turnover', CASE WHEN Turnover IS NOT NULL THEN 1 END),
('Domain', CASE WHEN Domain IS NOT NULL THEN 1 END)
) AS x(n, v)
GROUP BY Pk, Id
), cte2 AS (
SELECT cte1.*
, rn = ROW_NUMBER() OVER (PARTITION BY Id ORDER BY count_all DESC, count_postcode DESC, count_city DESC, count_address DESC)
FROM cte1
)
SELECT *
FROM cte2
WHERE rn= 1
它 returns 第 2 行(有邮政编码)和第 4 行(两个地址)。
我看不到为此使用 apply
的价值。只需使用 top (1)
和 order by
:
select top (1) t.*
from #test
order by (case when BusinessName is not null then 1 else 0 end +
case when Address1 is not null then 1 else 0 end +
case when Address2 is not null then 1 else 0 end +
case when Address3 is not null then 1 else 0 end +
case when Postcode is not null then 1 else 0 end +
case when City is not null then 1 else 0 end +
case when Country is not null then 1 else 0 end +
case when Turnover is not null then 1 else 0 end +
case when domain is not null then 1 else 0 end
) desc,
(case when Postcode is not null then 1 else 0 end) desc,
(case when City is not null then 1 else 0 end) desc,
(case when Address1 is not null then 1 else 0 end +
case when Address2 is not null then 1 else 0 end +
case when Address3 is not null then 1 else 0 end
) desc;
所有逻辑都集中在一个地方,不需要ORDER BY
之外的额外处理。