根据 python 中的查找模式从字符串中获取所有匹配项
get all occurences from a string based on find pattern in python
假设我有这样一个字符串:
exp = 'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END'
exp2 = 'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
我想 return 所有出现的 WHEN 和 THEN 以及它们的文本。
这是 exp1 的预期输出
['WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\'','WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\'']
这是 exp2 的预期输出
['WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']
我试过的是这样的:
res = re.findall(r'\s*(WHEN|When|when)+\s*(.*)\s*(THEN|Then|then)+\s*')
但是结果列表在我的例子中显示了这个输出
['(WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN)']
尝试:
WHEN (?:(?! +(?:WHEN|ELSE)).)* # with flags=re.I
WHEN
- 匹配 'WHEN '
(?:(?! +(?:WHEN|ELSE)).)
- 使用否定先行并声明只要当前位置不匹配后跟 'WHEN' 或 'ELSE' 的一个或多个 space 个字符,然后再匹配一个字符。
import re
cases = [
'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END',
'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
]
for case in cases:
res = re.findall(r'WHEN (?:(?! +(?:WHEN|ELSE)).)*', case, flags=re.I)
print(res)
打印:
['WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\'', 'WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\'']
['WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']
更新
如果要对 WHEN 和 ELSE 部分进行分组(去除前导和尾随 spaces),请使用以下正则表达式:
WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)
import re
cases = [
'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END',
'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
]
for case in cases:
results = re.findall(r'WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)', case, flags=re.I)
for result in results:
print(result[0], result[1])
打印:
"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='CPU' 'YES'
"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='RAM' 'YES'
("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")
假设我有这样一个字符串:
exp = 'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END'
exp2 = 'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
我想 return 所有出现的 WHEN 和 THEN 以及它们的文本。
这是 exp1 的预期输出
['WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\'','WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\'']
这是 exp2 的预期输出
['WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']
我试过的是这样的:
res = re.findall(r'\s*(WHEN|When|when)+\s*(.*)\s*(THEN|Then|then)+\s*')
但是结果列表在我的例子中显示了这个输出
['(WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN)']
尝试:
WHEN (?:(?! +(?:WHEN|ELSE)).)* # with flags=re.I
WHEN
- 匹配 'WHEN '(?:(?! +(?:WHEN|ELSE)).)
- 使用否定先行并声明只要当前位置不匹配后跟 'WHEN' 或 'ELSE' 的一个或多个 space 个字符,然后再匹配一个字符。
import re
cases = [
'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END',
'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
]
for case in cases:
res = re.findall(r'WHEN (?:(?! +(?:WHEN|ELSE)).)*', case, flags=re.I)
print(res)
打印:
['WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\'', 'WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\'']
['WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")']
更新
如果要对 WHEN 和 ELSE 部分进行分组(去除前导和尾随 spaces),请使用以下正则表达式:
WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)
import re
cases = [
'CASE WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'CPU\' THEN \'YES\' WHEN "Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"=\'RAM\' THEN \'YES\' ELSE \'NO\' END',
'CASE WHEN ("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 THEN ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE") ELSE ("Expressions"."ORDER_ITEMS"."QUANTITY"+ "Expressions"."ORDER_ITEMS"."UNIT_PRICE") END '
]
for case in cases:
results = re.findall(r'WHEN +(.*?) +THEN +((?:(?! +(?:WHEN|ELSE)).)*)', case, flags=re.I)
for result in results:
print(result[0], result[1])
打印:
"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='CPU' 'YES'
"Expressions"."PRODUCT_CATEGORIES"."CATEGORY_NAME"='RAM' 'YES'
("Expressions"."ORDER_ITEMS"."QUANTITY"*"Expressions"."ORDER_ITEMS"."UNIT_PRICE")>0 ("Expressions"."ORDER_ITEMS"."QUANTITY"* "Expressions"."ORDER_ITEMS"."UNIT_PRICE")