在保持顺序的同时对缩进的文本块进行排序

Sort an indented block of text while keeping order

我有一段文本需要重新排列(使用 Python),如下所示:

foo
    bar
        inner 1
        inner 3
        inner 2
    another
        stuff c
        stuff b
        stuff a
    more
        items z
        items x
        items y

并且此排序函数的输出必须如下所示

foo
    another
        stuff a
        stuff b
        stuff c
    bar
        inner 1
        inner 2
        inner 3
    more
        items x
        items y
        items z

重要的细节是:

这是一个接近成功但不完全成功的尝试。

import re
import textwrap

_EXPECTED_INDENTATION = "    "
_PARSER = re.compile(r"(?P<indentation>\s*)(?P<words>.+)")


def _iter_lists(item):
    if not isinstance(item, list):
        return

    yield item

    for group in item:
        for inner in _iter_lists(group):
            yield inner


def _group_by_depth(names):
    previous_depth = -1
    all_groups = []
    inner_group = []

    for depth, name in names:
        if previous_depth != -1 and depth != previous_depth:
            all_groups.append(inner_group)
            inner_group = []

        inner_group.append((depth, name))
        previous_depth = depth

    if inner_group:
        # Add the last group, just in case it was missed
        all_groups.append(inner_group)

    return all_groups


def _parse_by_depth(text):
    output = []

    for line in text.split("\n"):
        if not line.strip():
            continue

        match = _PARSER.match(line)
        count = int(match.group("indentation").count(_EXPECTED_INDENTATION))
        word = match.group("words")
        output.append((count, word))

    return output


def _sort_all(all_groups):
    for group in all_groups:
        for inner in _iter_lists(group):
            inner.sort()


def flatten_sequence(sequence):
    if not sequence:
        return sequence

    if isinstance(sequence[0], list):
        return flatten_sequence(sequence[0]) + flatten_sequence(sequence[1:])

    return sequence[:1] + flatten_sequence(sequence[1:])


def main():
    """Run the main execution of the current script."""
    text = textwrap.dedent(
        """\
        foo
            bar
                inner 1
                inner 3
                inner 2
            another
                stuff c
                stuff b
                stuff a
            more
                items z
                items x
                items y
        """
    )

    names = _parse_by_depth(text)

    # `_parse_by_depth` should generate
    # names = [
    #     (0, 'foo'),
    #         (1, 'bar'),
    #             (2, 'inner 1'),
    #             (2, 'inner 3'),
    #             (2, 'inner 2'),
    #         (1, 'another'),
    #             (2, 'stuff c'),
    #             (2, 'stuff b'),
    #             (2, 'stuff a'),
    #         (1, 'more'),
    #             (2, 'items z'),
    #             (2, 'items x'),
    #             (2, 'items y'),
    # ]

    all_groups = _group_by_depth(names)
    _sort_all(all_groups)

    flattened = flatten_sequence(all_groups)

    for depth, name in flattened:
        print("{indentation}{name}".format(indentation=_EXPECTED_INDENTATION * depth, name=name))


if __name__ == "__main__":
    main()

不过没用

foo
    bar
        inner 1
        inner 2
        inner 3
    another
        stuff a
        stuff b
        stuff c
    more
        items x
        items y
        items z

因为_sort_all只能正确排序连续的块。例如"inner 1/2/3" 和 "stuff a/b/c" 会被正确排序,但是 bar、another 和 more 等父类的顺序仍然是错误的。如何修改 _group_by_depth and/or _sort_all 以获得预期的顺序?

我建议采用这种方法:

我们可以将输入解释为具有几列的 table,其中缩进对应于跳转到下一列。被跳过的列假定具有与“父”行相同的值。我们可以想象这个 table 删除了那些“重复值”:

column 1 column 2 column 3
foo
(foo) bar
(foo) (bar) inner 1
(foo) (bar) inner 3
(foo) (bar) inner 2
(foo) another
(foo) (another) stuff c
(foo) (another) stuff b
(foo) (another) stuff a
(foo) more
(foo) (more) items z
(foo) (more) items x
(foo) (more) items y

一个想法是构建这个二维列表(包括重复值),然后对其进行排序,然后将其转换回原始格式。

这是代码:

def sort_indented_text(text, spacing):
    data = []
    row = []
    for line in text.splitlines():
        stripped = line.lstrip()
        row = row[0:(len(line) - len(stripped)) // spacing] + [stripped]
        data.append(row)

    return "\n".join(
        " " * (spacing * (len(row) - 1)) + row[-1] for row in sorted(data)
    )

您可以按如下方式使用:

text = """foo
    bar
        inner 1
        inner 3
        inner 2
    another
        stuff c
        stuff b
        stuff a
    more
        items z
        items x
        items y"""

print(sort_indented_text(text, 4))