如何在不引发 UnicodeEncodeError 的情况下覆盖 str 函数？

Question

我很困惑，为 class 定义 __str__ 似乎对在 class 实例上使用 str 函数没有影响。例如，我在 Django 文档中读到：

The print statement and the str built-in call __str__() to determine the human-readable representation of an object.

但这似乎不是真的。这是一个模块的示例，其中 text 始终假定为 unicode：

import six

class Test(object):

    def __init__(self, text):
        self._text = text

    def __str__(self):
        if six.PY3:
            return str(self._text)
        else:
            return unicode(self._text)

    def __unicode__(self):
        if six.PY3:
            return str(self._text)
        else:
            return unicode(self._text)

在 Python 2 中，它给出了以下行为：

>>> a=Test(u'café')
>>> print a.__str__()
café
>>> print a # same error with str(a)
---------------------------------------------------------------------------
UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-63-202e444820fd> in <module>()
----> 1 str(a)

UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' in position 3: ordinal not in range(128)

有没有办法重载 str 函数？

Answer 1

对于 Python 2，您从 __str__ 方法中 return 输入了错误的类型。您正在 returning unicode，而您必须 return str:

def __str__(self):
    if six.PY3:
        return str(self._text)
    else:
        return self._text.encode('utf8')

因为 self._text 还不是 str 类型，您需要对其进行编码。因为您 return 改用 Unicode，Python 被迫首先对其进行编码，但默认的 ASCII 编码无法处理非 ASCII é 字符。

打印对象导致正确的输出只是因为我的终端被配置为处理 UTF-8:

>>> a = Test(u'café')
>>> str(a)
'caf\xc3\xa9'
>>> print a
café
>>> unicode(a)
u'caf\xe9'

注意Python3中没有__unicode__方法；您在该方法中的 if six.PY3 完全是多余的。以下内容也适用：

class Test(object):
    def __init__(self, text):
        self._text = text

    def __str__(self):
        if six.PY3:
            return self._text
        else:
            return self._text.encode('utf8')

    def __unicode__(self):
        return self._text

但是，如果您正在使用 six 库，则最好使用 @six.python_2_unicode_compatible decorator，并且只为 [=14] 定义一个 Python 3 版本=]方法：

@six.python_2_unicode_compatible
class Test(object):
    def __init__(self, text):
        self._text = text

    def __str__(self):
        return self._text

假定 text 始终是 Unicode。如果您正在使用 Django，那么您可以从 django.utils.encoding module.

获得相同的装饰器

如何在不引发 UnicodeEncodeError 的情况下覆盖 str 函数？

How do I override the str function without raising a UnicodeEncodeError?

python

unicode

python-2.x