T-SQL - 如何仅在一个请求中计算包含多个子字符串的列中包含子字符串的总行数?

T-SQL - How to count the total number of rows containing a substring inside a column for numerous multiple substrings in only one request?

这个网站多年来对我非常有用。我的 SQL 问题总是有问题(和答案)。这一次,我找不到了。所以这是我的问题:

对于一个有很多标准的搜索引擎,我有几个列,每个列都包含几个数据/子字符串。

说:

Ref    |    application         |        type     |             source
_______________________________________________________________________________    
A      |      ak, bct, rg-t     |rega, mann, itr  | abc, ghf, eeerr, lam, rmn    
B      |      ak                |rega, aze        | null        
C      |     rg-t               | null            |         abc, ghf,
D      |     ak                 |rega, mann, itr  | abc, ghf, eeerr, lam, rmn    
E      |    null                |rega             | lam, rmn

每个代码/子字符串应该是唯一的

在一个请求中,我想知道ak出现了多少次,[=15=出现了多少次,rg-t出现了多少次……

有没有一种方法可以对应于:

Select 
   count(application(ak)), 
   count(application(bct)), 
   count(application(rg-t)),
   … 
   count(type(rega)), 
   count(type(aze)), 
   … 
   count(source(abc)),
   count(source(eeerr))
   …

并且会给出一行:

3   |1  |2  |…  |4  |1  |…      |3  |2  |…

提前致谢

我同意@AndyKorneyev 的评论,如果您只是对数据进行规范化,那将是最好的。使用你所拥有的,你可以使用 SUM() 函数。

这是一个fiddle

SELECT
  -- Application counts
   SUM( CASE WHEN application LIKE '%ak%' THEN 1 ELSE 0  END ) AS application_count
  ,SUM( CASE WHEN application LIKE '%bct%' THEN 1 ELSE 0  END ) AS bct_count
  ,SUM( CASE WHEN application LIKE '%rg-t%' THEN 1 ELSE 0  END ) AS rgt_count
  -- Type counts
  ,SUM( CASE WHEN type LIKE '%rega%' THEN 1 ELSE 0  END ) AS rega_count
  ,SUM( CASE WHEN type LIKE '%aze%' THEN 1 ELSE 0 END) AS aze_count
  -- Source counts
  ,SUM( CASE WHEN [source] LIKE '%abce%' THEN 1 ELSE 0 END) AS abc_count
  ,SUM( CASE WHEN [source] LIKE '%eeerr%' THEN 1 ELSE 0 END) AS eeerr_count
FROM my_table

如前所述,您应该考虑更改数据模型...

如果您必须处理它,此查询会为您提供出现次数和计数的列表:

declare @data table(Ref varchar(10), application varchar(100), type varchar(100), source varchar(100))
insert into @data(Ref, application, type, source) values
('A','ak, bct, rg-t','rega, mann, itr','abc, ghf, eeerr, lam, rmn')  
, ('B','ak','rega, aze',null       )
, ('C','rg-t',null,'abc, ghf, ')
, ('D','ak','rega, mann, itr','abc, ghf, eeerr, lam, rmn    ')
, ('E',null,'rega','lam, rmn')

Select t, v, count = count(*)
From (
    Select Ref, t
        , v = LTRIM(RTRIM(x.d.value('.[1]','varchar(100)')))
    From (
        Select Ref, t = 'application'
            , xml = CAST('<Root><Data>' + REPLACE(application,',','</Data><Data>') + '</Data></Root>' AS XML)
        From @data
        Union All
        Select Ref, t = 'type'
            , xml = CAST('<Root><Data>' + REPLACE(type,',','</Data><Data>') + '</Data></Root>' AS XML)
        From @data
        Union All
        Select Ref, t = 'source'
            , xml = CAST('<Root><Data>' + REPLACE(source,',','</Data><Data>') + '</Data></Root>' AS XML)
        From @data
    ) t
    Cross Apply xml.nodes('/Root/Data')x(d)
) as list
Where v <> ''
Group By t, v
Order By t, v

输出:

t           | v     | count
application | ak    | 3
application | bct   | 1
application | rg-t  | 2
source      | abc   | 3
source      | eeerr | 2
source      | ghf   | 3
source      | lam   | 3
source      | rmn   | 3
type        | aze   | 1
type        | itr   | 2
type        | mann  | 2
type        | rega  | 4

子查询通过将每个值转换为 xml 来拆分您的逗号分隔值。它执行 3 次 (UNION ALL),每列一次:应用程序、来源、类型。

然后它只需要把它们全部数出来即可。

在我们进行汇总之前,我必须 "normalise" 你的数据:

CREATE TABLE Table1
    ([Ref] varchar(1), [application] varchar(13), [type] varchar(15), [source] varchar(25))
;

INSERT INTO Table1
    ([Ref], [application], [type], [source])
VALUES
    ('A', 'ak, bct, rg-t', 'rega, mann, itr', 'abc, ghf, eeerr, lam, rmn'),
    ('B', 'ak', 'rega, aze', NULL),
    ('C', 'rg-t', NULL, 'abc, ghf'),
    ('D', 'ak', 'rega, mann, itr', 'abc, ghf, eeerr, lam, rmn'),
    ('E', NULL, 'rega', 'lam, rmn')
;

;WITH CTE AS (
SELECT 'Application' AS [Type]
, CAST('<r>' + REPLACE([Application], ',', '</r><r>') + '</r>' AS XML) AS [Value]
FROM Table1
UNION ALL
SELECT 'Type' AS [Type]
, CAST('<r>' + REPLACE([Type], ',', '</r><r>') + '</r>' AS XML) AS [Value]
FROM Table1
UNION ALL
SELECT 'Source' AS [Type]
, CAST('<r>' + REPLACE([Source], ',', '</r><r>') + '</r>' AS XML) AS [Value]
FROM Table1)
, CTE2 AS (
SELECT [Type], LTRIM(RTRIM(xTable.xColumn.value('.', 'VARCHAR(MAX)'))) AS [Value]
FROM CTE
CROSS APPLY [Value].nodes('//r') AS xTable(xColumn))
SELECT [Type], [Value], COUNT(*) AS [Count]
FROM CTE2
GROUP BY [Type], [Value]
ORDER BY [Type], [Value]