根据 python 中的查找模式从字符串中获取所有匹配项

get all occurences from a string based on find pattern in python

假设我有这样一个字符串:

exp = 'CASE WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'  WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\' ELSE  \'NO\' END' 
exp2 = 'CASE WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE")   END '

我想 return 所有出现的 WHEN 和 THEN 以及它们的文本。

这是 exp1 的预期输出

['WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'','WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\'']

这是 exp2 的预期输出

['WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']

我试过的是这样的:

res = re.findall(r'\s*(WHEN|When|when)+\s*(.*)\s*(THEN|Then|then)+\s*')

但是结果列表在我的例子中显示了这个输出

['(WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'  WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN)']

尝试:

WHEN (?:(?! +(?:WHEN|ELSE)).)* # with flags=re.I
  1. WHEN - 匹配 'WHEN '
  2. (?:(?! +(?:WHEN|ELSE)).) - 使用否定先行并声明只要当前位置不匹配后跟 'WHEN' 或 'ELSE' 的一个或多个 space 个字符,然后再匹配一个字符。

See Regex Demo

import re

cases = [
    'CASE WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'  WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\' ELSE  \'NO\' END',
    'CASE WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE")   END '
]

for case in cases:
    res = re.findall(r'WHEN (?:(?! +(?:WHEN|ELSE)).)*', case, flags=re.I)
    print(res)

打印:

['WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'', 'WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\'']
['WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']

更新

如果要对 WHEN 和 ELSE 部分进行分组(去除前导和尾随 spaces),请使用以下正则表达式:

WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)
import re

cases = [
    'CASE WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\'  THEN   \'YES\'  WHEN  "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\'  THEN   \'YES\' ELSE  \'NO\' END',
    'CASE WHEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0  THEN  ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE")   END '
]

for case in cases:
    results = re.findall(r'WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)', case, flags=re.I)
    for result in results:
        print(result[0], result[1])

打印:

"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='CPU' 'YES'
"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='RAM' 'YES'
("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")