Python - isalpha() returns 在 unicode 修饰符上为真

Python - isalpha() returns True on unicode modifiers

python
unicode
python-unicode

为什么 u'\u02c7'.isalpha() return 正确，如果符号 ˇ 不是字母？此方法是否仅适用于 ASCII 字符？

U+02c7 CARON is a codepoint in the Lm (Modifier Letter) category，所以根据Unicode标准，它是字母的。

str.isalpha() 的文档明确说明了包含的内容：

Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”.)

您没有定义正常工作的意思；显然，您对字母的构成有不同的定义。如果你只期望 Latin-1 个字母，那么你需要限制还需要测试字符串是否可以安全地编码为 Latin-1。 Unicode 的 Latin-1 子集中恰好有零个 Lm-category 个代码点（也没有 Lt 字符，只有 2 个 Lo 字符，ª (U+00AA) 和 º (U+00BA)）。

Python - isalpha() returns 在 unicode 修饰符上为真

Python - isalpha() returns True on unicode modifiers

python

unicode

python-unicode