使用正则表达式删除字符串中不一致的首字母缩略词
Remove inconsistent acronyms in strings using regex
我想删除所有首字母缩略词,即使是那些写法不一致的。例如,在下面的列表中 (text
),一些首字母缩略词缺少左括号或右括号,因此我也想删除它们。我只能删除带有两个右括号的那些。
我如何调整当前的 re 表达式,使其不仅仅关注带有 2 个括号的大写字符?
import re
text = ['Spain (ES)', 'Netherlands (NL .', 'United States (USA.', 'Russia RU)']
for string in text:
cleaned_acronyms = re.sub(r'\([A-Z]*\)', '', string) #remove uppercase chars with ( ).
print(cleaned_acronyms)
#current output
>>> Spain
>>> Netherlands (NL .
>>> United States (USA.
>>> Russia RU)
期望输出:
>>> Spain
>>> Netherlands
>>> United States
>>> Russia
您可以将括号之间的大写字符与每一侧的任意一个匹配,然后是该行的其余部分。
\s*(?:\([A-Z]{2,}|[A-Z]{2,}\)).*
例如
import re
text = ['Spain (ES)', 'Netherlands (NL .', 'United States (USA.', 'Russia RU)']
for string in text:
cleaned_acronyms = re.sub(r'\s*(?:\([A-Z]{2,}|[A-Z]{2,}\)).*', '', string)
print(cleaned_acronyms)
输出
Spain
Netherlands
United States
Russia
你可能会相处
\(?\b[A-Z.]{2,3}\b.+
我想删除所有首字母缩略词,即使是那些写法不一致的。例如,在下面的列表中 (text
),一些首字母缩略词缺少左括号或右括号,因此我也想删除它们。我只能删除带有两个右括号的那些。
我如何调整当前的 re 表达式,使其不仅仅关注带有 2 个括号的大写字符?
import re
text = ['Spain (ES)', 'Netherlands (NL .', 'United States (USA.', 'Russia RU)']
for string in text:
cleaned_acronyms = re.sub(r'\([A-Z]*\)', '', string) #remove uppercase chars with ( ).
print(cleaned_acronyms)
#current output
>>> Spain
>>> Netherlands (NL .
>>> United States (USA.
>>> Russia RU)
期望输出:
>>> Spain
>>> Netherlands
>>> United States
>>> Russia
您可以将括号之间的大写字符与每一侧的任意一个匹配,然后是该行的其余部分。
\s*(?:\([A-Z]{2,}|[A-Z]{2,}\)).*
例如
import re
text = ['Spain (ES)', 'Netherlands (NL .', 'United States (USA.', 'Russia RU)']
for string in text:
cleaned_acronyms = re.sub(r'\s*(?:\([A-Z]{2,}|[A-Z]{2,}\)).*', '', string)
print(cleaned_acronyms)
输出
Spain
Netherlands
United States
Russia
你可能会相处
\(?\b[A-Z.]{2,3}\b.+