从符合特定条件的文本中抓取数字
Grabbing number from text where meets specific critera
好的,我有一堆数据,所有数据都包含文本中的代码,但格式不正确,例如:
Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015
IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132
Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5
我只对格式如下的 8-9 位数字感兴趣:
xxxx-xxxx 或 xxxxx-xxxx
我目前使用以下方法选择了这些条目:
WHERE [Product Description] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%' OR [Product Description] LIKE '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
但我想输出它匹配的字符串,而不是整个产品描述,只输出它找到的代码,例如:
0363-0073
19061-3132
0409-4178
对于单个值,您可以使用 PATINDEX
:
SELECT
SUBSTRING(ProductDescription
,PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
,ProductDescription),
10), *
FROM t
WHERE
[ProductDescription] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%';
您可以使用此方法获取任一代码的第一个实例(基于 lad2025 的回答):
declare @t table (v varchar(8000))
insert @t(v) values ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')
SELECT *
FROM @T
select substring(v, patindex('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%', v), 10)
from @t
where v like '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
union all
select substring(v, patindex('%[^0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%', v), 9)
from @t
where v like '%[^0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
这是一种略有不同的方法,它不使用 UNION ALL
:
WITH VTE AS (
SELECT *
FROM (VALUES ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')) V(S))
SELECT V.S,
CASE WHEN PI1.C > 0 THEN SUBSTRING(V.S,PI1.C, 10)
WHEN PI2.C > 0 THEN SUBSTRING(V.S,PI2.C, 9)
ELSE NULL
END AS N
FROM VTE V
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI1(C)
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI2(C);
2 个 PATINDEX
的原因是因为值 12345-6789
将满足模式 '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
。因此首先完成对 10 个字符格式的检查,然后是第 9 个字符。 CASE
表达式也可以避免错误,如果两个模式都没有找到,就好像 PI1.C
和 PI2.C
return 0
(意味着没有找到模式)那么 NULL
是 returned.
好的,我有一堆数据,所有数据都包含文本中的代码,但格式不正确,例如:
Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015
IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132
Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5
我只对格式如下的 8-9 位数字感兴趣:
xxxx-xxxx 或 xxxxx-xxxx
我目前使用以下方法选择了这些条目:
WHERE [Product Description] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%' OR [Product Description] LIKE '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
但我想输出它匹配的字符串,而不是整个产品描述,只输出它找到的代码,例如:
0363-0073
19061-3132
0409-4178
对于单个值,您可以使用 PATINDEX
:
SELECT
SUBSTRING(ProductDescription
,PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
,ProductDescription),
10), *
FROM t
WHERE
[ProductDescription] LIKE '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%';
您可以使用此方法获取任一代码的第一个实例(基于 lad2025 的回答):
declare @t table (v varchar(8000))
insert @t(v) values ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')
SELECT *
FROM @T
select substring(v, patindex('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%', v), 10)
from @t
where v like '%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
union all
select substring(v, patindex('%[^0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%', v), 9)
from @t
where v like '%[^0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
这是一种略有不同的方法,它不使用 UNION ALL
:
WITH VTE AS (
SELECT *
FROM (VALUES ('Well at Wallgreens Regular Strength Antacid Liquid (Alumina Magnesia Simethicone Antacid & Anti Gas) Mint a)12 oz bottle (NDC 0363-0073-02) b) 26 oz bottle (NDC 0363-0073-26) Distributed by Walgreens CO 200 Wilmot Rd Deerfield IL 60015'),
('IDPN (Intradialytic Parenteral Nutrition - dialysate solution with added amino acids) a) 490mL bag b) 500mL bag and c) 590mL bag Pentec Health Inc 4 Creek Parkway Suite A Boothwyn PA 19061-3132'),
('Aminosyn-PF (amino acids) 7% Sulfite-Free 500 mL Bags Rx Only Hospira Inc Lake Forest IL 60045 NDC: 0409-4178-03 Barcode (01) 0 030409 417803 5')) V(S))
SELECT V.S,
CASE WHEN PI1.C > 0 THEN SUBSTRING(V.S,PI1.C, 10)
WHEN PI2.C > 0 THEN SUBSTRING(V.S,PI2.C, 9)
ELSE NULL
END AS N
FROM VTE V
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI1(C)
CROSS APPLY (VALUES(PATINDEX('%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%',V.S))) PI2(C);
2 个 PATINDEX
的原因是因为值 12345-6789
将满足模式 '%[0-9][0-9][0-9][0-9]-[0-9][0-9][0-9][0-9]%'
。因此首先完成对 10 个字符格式的检查,然后是第 9 个字符。 CASE
表达式也可以避免错误,如果两个模式都没有找到,就好像 PI1.C
和 PI2.C
return 0
(意味着没有找到模式)那么 NULL
是 returned.