Python 正则表达式希腊字符

Question

我有一些具有这种结构的字符串：<name> (<unit>)。我想提取 name 和 unit；为了执行此任务，我使用 regex，在大多数情况下都很好。
但是，在某些情况下 <unit> 由希腊字符组成，例如 Ω。在这些情况下，我的代码无法提取所需的两个部分。
这是我的代码：

import re

def name_unit_split(text):
    name = re.split(' \([A-Za-z]*\)', text)[0]
    unit = re.findall('\([A-Za-z]*\)', text)

    if unit != []:
        unit = unit[0][1:-1]
    else:
        unit = ''

    return name, unit

print(name_unit_split('distance (mm)'))

我得到：

('distance', 'mm')

但是当我尝试使用时：

print(name_unit_split('resistance (Ω)'))

我得到：

('resistance (Ω)', '')

我搜索了其他正则表达式占位符并尝试使用它们，但没有成功：

name = re.split(' \([\p{Greek}]*\)', text)[0]
unit = re.findall('\([\p{Greek}]*\)', text)

如何使用 regex 在字符串中查找希腊字符（一个或多个，分组）？
此外，是否有更好的方法使用 regex 执行上述任务？我的意思是：有一种方法可以提取 <name> 和 <unit> 并将它们保存在 name 和 unit 中 regex?

Answer 1

就像拉丁字母一样，希腊字母在 utf-8 编码中占据连续的 space，因此您可以使用 \([α-ωΑ-Ω]*\) 而不是 \([A-Za-z]*\ 来构造您的正则表达式。

我个人更喜欢使用像 "[A-Za-z]* \([α-ωΑ-Ω]*\)" 这样的正则表达式来检查模式是否成立，并使用字符串函数来执行拆分作业。但我相信这取决于你的个人喜好。

Answer 2

结构 <name> (<unit>) 的非正则表达式解决方案是 str.partition:

>>> name, _, unit = "resistance (Ω)"[:-1].partition(" (")
>>> name
'resistance'
>>> unit
'Ω'

Python 正则表达式希腊字符

Python regex greek characters

python

regex

string

extract

placeholder