透明地使用 str 和 unicode 对象进行 .translate

Question

这是我用来抽象出 unicode 和 str:

之间 .translate 差异的实现

import types
from string import maketrans

def str_translate(txt, inchars, outchars, deletechars):
    if inchars : transtab = maketrans(inchars, outchars)
    else       : transtab = None
    return txt.translate(transtab, deletechars)


def maketrans_u(inchars, outchars, deletechars):
    '''Create a translation table for unicode. We assume that we
    want to map one inchar to one outchar (but the actual unicode.translate function
    is more powerful: it can also map one inchar to a unicode string)
    We assume deletechars and inchars do not overlap (no checking done!)'''
    if inchars : transtab = dict((ord(inchar), ord(outchar)) for inchar, outchar in zip(inchars, outchars))
    else       : transtab = { }
    # Now map the deletechars to None
    for char in deletechars:
        transtab[ord(char)] = None
    return transtab


def unicode_translate(txt, inchars, outchars, deletechars):
    transtab = maketrans_u(inchars, outchars, deletechars)
    return txt.translate(transtab)


def translate(txt, inchars=None, outchars=None, deletechars=None):
    t = type(txt)
    if   t == types.StringType  : return str_translate(txt, inchars, outchars, deletechars)
    elif t == types.UnicodeType : return unicode_translate(txt, inchars, outchars, deletechars)
    else                        : raise Exception('Not supported type %s' % (t))


if __name__ == '__main__' :
    a = 'abc%=def'
    deletechars = '=%'
    print translate(a, deletechars=deletechars)

这里我失去了 unicode.translate 的一些功能（即，将一个字符转换为字符串），但至少我有一个统一的接口，可以用来转换 unicode 和纯字符串，无需关心类型。

我不喜欢的是：

此实现依赖于检查字符串的类型以调用正确的函数
我不能做 txt.translate(...)（我必须做 translate(txt, ...)，这意味着我不能像 txt[:50].translate(...)

有没有更好的方法实现透明.translate？

Answer 1

this implementation relies in checking the type of the string in order to call the right function

嗯，它还能做什么？你想为不同的类型做不同的事情，你不能用 monkeypatch 类型来以点语法 OO 风格来做，那么你怎么能自动调度类型呢？您正在寻找的是外部派遣。 Python 可以在 3.4+ 中执行此操作（仅在第一个参数上调度，而不是像 CLOS 或 Dylan 这样的所有参数......尽管 PyPI 上有多个调度库）在 PyPI 上 singledispatch, and there's a backport 可以回到 2.6 .所以，你可以这样做：

from singledispatch import singledispatch

@singledispatch
def translate(txt, inchars=None, outchars=None, deletechars=None):
    raise Exception('Not supported type %s' % (t))

@translate.register(str)
def translate(txt, inchars=None, outchars=None, deletechars=None):
    return str_translate(txt, inchars, outchars, deletechars)

@translate.register(unicode)
def translate(txt, inchars=None, outchars=None, deletechars=None):
    return unicode_translate(txt, inchars, outchars, deletechars)

另请注意，我只是使用了 str 和 unicode 而不是 types.StringType 和 types.UnicodeType。正如文档所说，这些类型只是别名，并不是真正必要的。他们所做的只是让你的代码不那么向后兼容。（并且它们无助于与 3.x 的向前兼容；3.0 只是删除了不必要的别名，而不是使 StringType 和 UnicodeType 都成为 str 的别名并添加 BytesType…)

如果您不想使用 PyPI 之外的库或自己实现相同的东西，而是想要手动类型切换，您可能需要 isinstance 而不是 type(x) ==。

I can not do txt.translate(...) (I must do translate(txt, ...)

没错；你不能 monkeypatch str 和 unicode。但那又怎样？

which means I can not chain function calls like txt[:50].translate(...)

当然可以，但是您可以像 translate(txt[:50], …).rstrip().split(':') 那样链接函数调用。虽然这在 Java 或 Ruby 等 "everything-is-a-method" 语言中可能看起来反惯用语，但在 Python 中完全没问题。特别是因为无论如何在 Python 中链接超过 2 或 3 个调用是非常罕见的。毕竟，split 之后的下一件事必须是 map 调用或理解，而这些不是通过 Python.

中的方法完成的

Here I am losing some of the power of the unicode.translate (namely, translating one character to a string)

是的，这几乎是最低公分母设计中固有的。一些性能损失也是如此。 str.translate 和 unicode.translate 并没有真正做完全相同的事情。前者是基于 table 的翻译，因为当您只有 256 个可能的值时，这是一个很好的优化，但这确实意味着您放弃了一些灵活性和功能。后者是基于字典的翻译，因为 table 将是对 110 万个值的悲观化，但这意味着您获得了一些额外的灵活性和功能。

所以，在这里，你放弃了 str.translate 的性能（特别是因为你必须为每个翻译动态构建 transtab），以及 [=29] 的灵活性=]，两全其美。

如果您确实知道 str 字符串的编码（并且它们确实代表文本——毕竟，str.translate 也可用于二进制数据……），您可以改写这个仅 s.decode(encoding).translate(…).encode(encoding)。但是如果你知道编码，你也可以首先使用 unicode 而不是 str。

但我认为更好的解决方案可能是以 returns 两个 table 的元组 str 和一个元组的方式来包装 maketrans unicode 的一个字典。然后你可以调用原生 s.translate(*transtab) 来替代 translate.

不幸的是，您不能为此使用 singledispatch，因为任何参数都可能是 None，这意味着我们又回到了显式类型切换。

def maketrans(inchars, outchars, deletechars):
    if isinstance(inchars, str) or isinstance(deletechars, str):
        return maketrans_s(inchars, outchars, deletechars)
    elif isinstance(inchars, unicode) or isinstance(deletechars, unicode):
        return maketrans_u(inchars, outchars, deletechars)
    raise Exception('Not supported type %s' % (t))

def maketrans_s(inchars, outchars, deletechars):
    if inchars: transtab = maketrans(inchars, outchars)
    else: transtab = None
    return transtab, deletechars

def maketrans_u(inchars, outchars, deletechars):
    # The if was unnecessary here; if inchars is empty, the zip
    # will be too, so you'll get {} as the result. Also notice
    # no ord(outchar); this means you _can_ use Unicode strings
    # when you know the string is Unicode.
    transtab = dict((ord(inchar), outchar) for inchar, outchar in zip(inchars, outchars))
    for char in deletechars:
        transtab[ord(char)] = None
    return transtab,

现在您可以这样做了：

transtab = maketrans(inchars, outchars, deletechars)
return s.translate(*transtab).rstrip().split(':')

但实际上，我不确定这首先有什么用。你怎么能在不知道你的 inchars 和 deletechars 是 str 还是 unicode 的情况下调用 maketrans 或 translate？

透明地使用 str 和 unicode 对象进行 .translate

Transparently doing .translate with str and unicode objects

python

unicode