在保持顺序的同时对缩进的文本块进行排序

Question

我有一段文本需要重新排列（使用 Python），如下所示：

foo
    bar
        inner 1
        inner 3
        inner 2
    another
        stuff c
        stuff b
        stuff a
    more
        items z
        items x
        items y

并且此排序函数的输出必须如下所示

foo
    another
        stuff a
        stuff b
        stuff c
    bar
        inner 1
        inner 2
        inner 3
    more
        items x
        items y
        items z

重要的细节是：

如上例所示，每个新的“深度”用 4 个空格表示。这在整个文本中都是一致的。
在每个深度，项目应按字母顺序排序。然而，即使在排序之后，树的结构也必须保持不变。所以“stuff a/b/c”必须始终将“bar”作为其父项。并且“items x/y/z”必须始终将“more”作为其父项。

这是一个接近成功但不完全成功的尝试。

import re
import textwrap

_EXPECTED_INDENTATION = "    "
_PARSER = re.compile(r"(?P<indentation>\s*)(?P<words>.+)")


def _iter_lists(item):
    if not isinstance(item, list):
        return

    yield item

    for group in item:
        for inner in _iter_lists(group):
            yield inner


def _group_by_depth(names):
    previous_depth = -1
    all_groups = []
    inner_group = []

    for depth, name in names:
        if previous_depth != -1 and depth != previous_depth:
            all_groups.append(inner_group)
            inner_group = []

        inner_group.append((depth, name))
        previous_depth = depth

    if inner_group:
        # Add the last group, just in case it was missed
        all_groups.append(inner_group)

    return all_groups


def _parse_by_depth(text):
    output = []

    for line in text.split("\n"):
        if not line.strip():
            continue

        match = _PARSER.match(line)
        count = int(match.group("indentation").count(_EXPECTED_INDENTATION))
        word = match.group("words")
        output.append((count, word))

    return output


def _sort_all(all_groups):
    for group in all_groups:
        for inner in _iter_lists(group):
            inner.sort()


def flatten_sequence(sequence):
    if not sequence:
        return sequence

    if isinstance(sequence[0], list):
        return flatten_sequence(sequence[0]) + flatten_sequence(sequence[1:])

    return sequence[:1] + flatten_sequence(sequence[1:])


def main():
    """Run the main execution of the current script."""
    text = textwrap.dedent(
        """\
        foo
            bar
                inner 1
                inner 3
                inner 2
            another
                stuff c
                stuff b
                stuff a
            more
                items z
                items x
                items y
        """
    )

    names = _parse_by_depth(text)

    # `_parse_by_depth` should generate
    # names = [
    #     (0, 'foo'),
    #         (1, 'bar'),
    #             (2, 'inner 1'),
    #             (2, 'inner 3'),
    #             (2, 'inner 2'),
    #         (1, 'another'),
    #             (2, 'stuff c'),
    #             (2, 'stuff b'),
    #             (2, 'stuff a'),
    #         (1, 'more'),
    #             (2, 'items z'),
    #             (2, 'items x'),
    #             (2, 'items y'),
    # ]

    all_groups = _group_by_depth(names)
    _sort_all(all_groups)

    flattened = flatten_sequence(all_groups)

    for depth, name in flattened:
        print("{indentation}{name}".format(indentation=_EXPECTED_INDENTATION * depth, name=name))


if __name__ == "__main__":
    main()

不过没用

foo
    bar
        inner 1
        inner 2
        inner 3
    another
        stuff a
        stuff b
        stuff c
    more
        items x
        items y
        items z

因为_sort_all只能正确排序连续的块。例如"inner 1/2/3" 和 "stuff a/b/c" 会被正确排序，但是 bar、another 和 more 等父类的顺序仍然是错误的。如何修改 _group_by_depth and/or _sort_all 以获得预期的顺序？

Answer 1

我建议采用这种方法：

我们可以将输入解释为具有几列的 table，其中缩进对应于跳转到下一列。被跳过的列假定具有与“父”行相同的值。我们可以想象这个 table 删除了那些“重复值”：

column 1	column 2	column 3
foo
(foo)	bar
(foo)	(bar)	inner 1
(foo)	(bar)	inner 3
(foo)	(bar)	inner 2
(foo)	another
(foo)	(another)	stuff c
(foo)	(another)	stuff b
(foo)	(another)	stuff a
(foo)	more
(foo)	(more)	items z
(foo)	(more)	items x
(foo)	(more)	items y

一个想法是构建这个二维列表（包括重复值），然后对其进行排序，然后将其转换回原始格式。

这是代码：

def sort_indented_text(text, spacing):
    data = []
    row = []
    for line in text.splitlines():
        stripped = line.lstrip()
        row = row[0:(len(line) - len(stripped)) // spacing] + [stripped]
        data.append(row)

    return "\n".join(
        " " * (spacing * (len(row) - 1)) + row[-1] for row in sorted(data)
    )

您可以按如下方式使用：

text = """foo
    bar
        inner 1
        inner 3
        inner 2
    another
        stuff c
        stuff b
        stuff a
    more
        items z
        items x
        items y"""

print(sort_indented_text(text, 4))

在保持顺序的同时对缩进的文本块进行排序

Sort an indented block of text while keeping order

python

tree