从符合特定条件的文本中抓取数字

Grabbing number from text where meets specific critera

好的,我有一堆数据,所有数据都包含文本中的代码,但格式不正确,例如:

Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015

IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132

Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5

我只对格式如下的 8-9 位数字感兴趣:

xxxx-xxxx 或 xxxxx-xxxx

我目前使用以下方法选择了这些条目:

WHERE [Product Description] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%' OR [Product Description] LIKE  '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'

但我想输出它匹配的字符串,而不是整个产品描述,只输出它找到的代码,例如:

0363-0073

19061-3132

0409-4178

对于单个值,您可以使用 PATINDEX:

SELECT 
    SUBSTRING(ProductDescription
              ,PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
                        ,ProductDescription),
             10), *
FROM t
WHERE 
 [ProductDescription] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%';

DBFiddle Demo

您可以使用此方法获取任一代码的第一个实例(基于 lad2025 的回答):

declare @t table (v varchar(8000))

insert @t(v) values ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')

SELECT  *
FROM @T

select  substring(v, patindex('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%', v), 10)
from    @t
where   v like '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
union all
select  substring(v, patindex('%[^0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%', v), 9)
from    @t
where   v like '%[^0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'

这是一种略有不同的方法,它不使用 UNION ALL:

WITH VTE AS (
    SELECT *
    FROM (VALUES ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
                 ('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
                 ('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')) V(S))
SELECT V.S,
       CASE WHEN PI1.C > 0 THEN SUBSTRING(V.S,PI1.C, 10)
            WHEN PI2.C > 0 THEN SUBSTRING(V.S,PI2.C, 9)
            ELSE NULL
       END AS N
FROM VTE V
     CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI1(C)
     CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI2(C);

2 个 PATINDEX 的原因是因为值 12345-6789 将满足模式 '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'。因此首先完成对 10 个字符格式的检查,然后是第 9 个字符。 CASE 表达式也可以避免错误,如果两个模式都没有找到,就好像 PI1.CPI2.C return 0 (意味着没有找到模式)那么 NULL 是 returned.