如何记录 reST/Sphinx 中字符串中的单个 space 字符？

Question

我迷失在各种边缘案例中。我正在将一些旧的纯文本文档转换为 reST/Sphinx 格式，目的是从那里输出几种格式（包括 HTML 和文本）。一些记录的函数用于处理位串，其中一个常见的情况是像下面这样的句子：Starting character is the blank " " which has the value 0.

我尝试通过以下方式将其写为内联文字：Starting character is the blank `` `` which has the value 0. 或 Starting character is the blank :literal:` ` which has the value 0. 但这些最终如何工作存在一些问题：

reST 语法对象为白色space 直接位于文字内部，并且无法识别。
上面可以"fixed"——它看起来在HTML（</code>）和明文中是正确的(<code>" ") 输出——在文字中有一个不间断的 space 字符，但从技术上讲，在我们的例子中这是一个谎言，如果用户复制了这个字符，他们就不会复制什么他们期待。
space 可以包含在正则引号中，这样可以正确识别文字，而 HTML 中的输出可能很好 (" ")，以明文形式显示它最终被双引号引用为 "" "".
在上面的两个 2/3 中，如果文字落在换行边界上，明文编写器（使用 textwrap）将很乐意将文字换行并且 trim space 因为它在行的 start/end 处。

我觉得我错过了什么；有什么好的方法可以解决这个问题吗？

Answer 1

尝试使用 unicode character codes。如果我理解你的问题，这应该有效。

Here is a "|space|" and a non-breaking space (|nbspc|)

.. |space| unicode:: U+0020 .. space
.. |nbspc| unicode:: U+00A0 .. non-breaking space

你应该看到：

这里是一个“ ”和一个不间断的space ( )

Answer 2

我希望不需要自定义代码来处理它就可以摆脱它，但是，唉，我还没有找到这样做的方法。我会再等几天再接受这个答案，以防有人有更好的主意。下面的代码不完整，我也不确定它是否 "done"（将在我们的审核过程中准确地整理出它应该是什么样子）但基本内容完好无损。

该方法有两个主要组成部分：

引入一个 char 角色，该角色需要字符的 unicode 名称作为其参数，并在将字符本身包装在内联文字节点中时生成字符的内联描述。
修改 Sphinx 使用的文本包装器，使其不会在 space.

代码如下：

class TextWrapperDeux(TextWrapper):
    _wordsep_re = re.compile(
    r'((?<!`)\s+(?!`)|'                       # whitespace not between backticks
    r'(?<=\s)(?::[a-z-]+:)`\S+|'              # interpreted text start
    r'[^\s\w]*\w+[a-zA-Z]-(?=\w+[a-zA-Z])|'   # hyphenated words
    r'(?<=[\w\!\"\'\&\.\,\?])-{2,}(?=\w))')   # em-dash

    @property
    def wordsep_re(self):
        return self._wordsep_re

def char_role(name, rawtext, text, lineno, inliner, options={}, content=[]):
    """Describe a character given by unicode name.

    e.g., :char:`SPACE` -> "char:` `(U+00020 SPACE)"
    """
    try:
        character = nodes.unicodedata.lookup(text)
    except KeyError:
        msg = inliner.reporter.error(
            ':char: argument %s must be valid unicode name at line %d' % (text, lineno))
        prb = inliner.problematic(rawtext, rawtext, msg)
        return [prb], [msg]
    app = inliner.document.settings.env.app
    describe_char = "(U+%05X %s)" % (ord(character), text)
    char = nodes.inline("char:", "char:", nodes.literal(character, character))
    char += nodes.inline(describe_char, describe_char)
    return [char], []

def setup(app):
    app.add_role('char', char_role)

上面的代码缺少一些胶水来实际强制使用新的 TextWrapper、导入等。当完整版本稳定下来时，我可能会尝试找到一种有意义的方式来重新发布它；如果是这样，我会 link 放在这里。

标记：Starting character is the :char:`SPACE` which has the value 0.

它将产生如下明文输出：Starting character is the char:` `(U+00020 SPACE) which has the value 0.

和HTML输出如下：Starting character is the <span>char:<code class="docutils literal"> </code><span>(U+00020 SPACE)</span></span> which has the value 0.

HTML 输出最终看起来大致如下：起始字符是 char:(U+00020 SPACE)，其值为 0。

如何记录 reST/Sphinx 中字符串中的单个 space 字符？

how to document a single space character within a string in reST/Sphinx?

restructuredtext

docutils

python-sphinx