在保持顺序的同时对缩进的文本块进行排序
Sort an indented block of text while keeping order
我有一段文本需要重新排列(使用 Python),如下所示:
foo
bar
inner 1
inner 3
inner 2
another
stuff c
stuff b
stuff a
more
items z
items x
items y
并且此排序函数的输出必须如下所示
foo
another
stuff a
stuff b
stuff c
bar
inner 1
inner 2
inner 3
more
items x
items y
items z
重要的细节是:
- 如上例所示,每个新的“深度”用 4 个空格表示
。这在整个文本中都是一致的。
- 在每个深度,项目应按字母顺序排序。然而,即使在排序之后,树的结构也必须保持不变。所以“stuff a/b/c”必须始终将“bar”作为其父项。并且“items x/y/z”必须始终将“more”作为其父项。
这是一个接近成功但不完全成功的尝试。
import re
import textwrap
_EXPECTED_INDENTATION = " "
_PARSER = re.compile(r"(?P<indentation>\s*)(?P<words>.+)")
def _iter_lists(item):
if not isinstance(item, list):
return
yield item
for group in item:
for inner in _iter_lists(group):
yield inner
def _group_by_depth(names):
previous_depth = -1
all_groups = []
inner_group = []
for depth, name in names:
if previous_depth != -1 and depth != previous_depth:
all_groups.append(inner_group)
inner_group = []
inner_group.append((depth, name))
previous_depth = depth
if inner_group:
# Add the last group, just in case it was missed
all_groups.append(inner_group)
return all_groups
def _parse_by_depth(text):
output = []
for line in text.split("\n"):
if not line.strip():
continue
match = _PARSER.match(line)
count = int(match.group("indentation").count(_EXPECTED_INDENTATION))
word = match.group("words")
output.append((count, word))
return output
def _sort_all(all_groups):
for group in all_groups:
for inner in _iter_lists(group):
inner.sort()
def flatten_sequence(sequence):
if not sequence:
return sequence
if isinstance(sequence[0], list):
return flatten_sequence(sequence[0]) + flatten_sequence(sequence[1:])
return sequence[:1] + flatten_sequence(sequence[1:])
def main():
"""Run the main execution of the current script."""
text = textwrap.dedent(
"""\
foo
bar
inner 1
inner 3
inner 2
another
stuff c
stuff b
stuff a
more
items z
items x
items y
"""
)
names = _parse_by_depth(text)
# `_parse_by_depth` should generate
# names = [
# (0, 'foo'),
# (1, 'bar'),
# (2, 'inner 1'),
# (2, 'inner 3'),
# (2, 'inner 2'),
# (1, 'another'),
# (2, 'stuff c'),
# (2, 'stuff b'),
# (2, 'stuff a'),
# (1, 'more'),
# (2, 'items z'),
# (2, 'items x'),
# (2, 'items y'),
# ]
all_groups = _group_by_depth(names)
_sort_all(all_groups)
flattened = flatten_sequence(all_groups)
for depth, name in flattened:
print("{indentation}{name}".format(indentation=_EXPECTED_INDENTATION * depth, name=name))
if __name__ == "__main__":
main()
不过没用
foo
bar
inner 1
inner 2
inner 3
another
stuff a
stuff b
stuff c
more
items x
items y
items z
因为_sort_all
只能正确排序连续的块。例如"inner 1/2/3" 和 "stuff a/b/c" 会被正确排序,但是 bar、another 和 more 等父类的顺序仍然是错误的。如何修改 _group_by_depth
and/or _sort_all
以获得预期的顺序?
我建议采用这种方法:
我们可以将输入解释为具有几列的 table,其中缩进对应于跳转到下一列。被跳过的列假定具有与“父”行相同的值。我们可以想象这个 table 删除了那些“重复值”:
column 1
column 2
column 3
foo
(foo)
bar
(foo)
(bar)
inner 1
(foo)
(bar)
inner 3
(foo)
(bar)
inner 2
(foo)
another
(foo)
(another)
stuff c
(foo)
(another)
stuff b
(foo)
(another)
stuff a
(foo)
more
(foo)
(more)
items z
(foo)
(more)
items x
(foo)
(more)
items y
一个想法是构建这个二维列表(包括重复值),然后对其进行排序,然后将其转换回原始格式。
这是代码:
def sort_indented_text(text, spacing):
data = []
row = []
for line in text.splitlines():
stripped = line.lstrip()
row = row[0:(len(line) - len(stripped)) // spacing] + [stripped]
data.append(row)
return "\n".join(
" " * (spacing * (len(row) - 1)) + row[-1] for row in sorted(data)
)
您可以按如下方式使用:
text = """foo
bar
inner 1
inner 3
inner 2
another
stuff c
stuff b
stuff a
more
items z
items x
items y"""
print(sort_indented_text(text, 4))
我有一段文本需要重新排列(使用 Python),如下所示:
foo
bar
inner 1
inner 3
inner 2
another
stuff c
stuff b
stuff a
more
items z
items x
items y
并且此排序函数的输出必须如下所示
foo
another
stuff a
stuff b
stuff c
bar
inner 1
inner 2
inner 3
more
items x
items y
items z
重要的细节是:
- 如上例所示,每个新的“深度”用 4 个空格表示
- 在每个深度,项目应按字母顺序排序。然而,即使在排序之后,树的结构也必须保持不变。所以“stuff a/b/c”必须始终将“bar”作为其父项。并且“items x/y/z”必须始终将“more”作为其父项。
这是一个接近成功但不完全成功的尝试。
import re
import textwrap
_EXPECTED_INDENTATION = " "
_PARSER = re.compile(r"(?P<indentation>\s*)(?P<words>.+)")
def _iter_lists(item):
if not isinstance(item, list):
return
yield item
for group in item:
for inner in _iter_lists(group):
yield inner
def _group_by_depth(names):
previous_depth = -1
all_groups = []
inner_group = []
for depth, name in names:
if previous_depth != -1 and depth != previous_depth:
all_groups.append(inner_group)
inner_group = []
inner_group.append((depth, name))
previous_depth = depth
if inner_group:
# Add the last group, just in case it was missed
all_groups.append(inner_group)
return all_groups
def _parse_by_depth(text):
output = []
for line in text.split("\n"):
if not line.strip():
continue
match = _PARSER.match(line)
count = int(match.group("indentation").count(_EXPECTED_INDENTATION))
word = match.group("words")
output.append((count, word))
return output
def _sort_all(all_groups):
for group in all_groups:
for inner in _iter_lists(group):
inner.sort()
def flatten_sequence(sequence):
if not sequence:
return sequence
if isinstance(sequence[0], list):
return flatten_sequence(sequence[0]) + flatten_sequence(sequence[1:])
return sequence[:1] + flatten_sequence(sequence[1:])
def main():
"""Run the main execution of the current script."""
text = textwrap.dedent(
"""\
foo
bar
inner 1
inner 3
inner 2
another
stuff c
stuff b
stuff a
more
items z
items x
items y
"""
)
names = _parse_by_depth(text)
# `_parse_by_depth` should generate
# names = [
# (0, 'foo'),
# (1, 'bar'),
# (2, 'inner 1'),
# (2, 'inner 3'),
# (2, 'inner 2'),
# (1, 'another'),
# (2, 'stuff c'),
# (2, 'stuff b'),
# (2, 'stuff a'),
# (1, 'more'),
# (2, 'items z'),
# (2, 'items x'),
# (2, 'items y'),
# ]
all_groups = _group_by_depth(names)
_sort_all(all_groups)
flattened = flatten_sequence(all_groups)
for depth, name in flattened:
print("{indentation}{name}".format(indentation=_EXPECTED_INDENTATION * depth, name=name))
if __name__ == "__main__":
main()
不过没用
foo
bar
inner 1
inner 2
inner 3
another
stuff a
stuff b
stuff c
more
items x
items y
items z
因为_sort_all
只能正确排序连续的块。例如"inner 1/2/3" 和 "stuff a/b/c" 会被正确排序,但是 bar、another 和 more 等父类的顺序仍然是错误的。如何修改 _group_by_depth
and/or _sort_all
以获得预期的顺序?
我建议采用这种方法:
我们可以将输入解释为具有几列的 table,其中缩进对应于跳转到下一列。被跳过的列假定具有与“父”行相同的值。我们可以想象这个 table 删除了那些“重复值”:
column 1 | column 2 | column 3 |
---|---|---|
foo | ||
(foo) | bar | |
(foo) | (bar) | inner 1 |
(foo) | (bar) | inner 3 |
(foo) | (bar) | inner 2 |
(foo) | another | |
(foo) | (another) | stuff c |
(foo) | (another) | stuff b |
(foo) | (another) | stuff a |
(foo) | more | |
(foo) | (more) | items z |
(foo) | (more) | items x |
(foo) | (more) | items y |
一个想法是构建这个二维列表(包括重复值),然后对其进行排序,然后将其转换回原始格式。
这是代码:
def sort_indented_text(text, spacing):
data = []
row = []
for line in text.splitlines():
stripped = line.lstrip()
row = row[0:(len(line) - len(stripped)) // spacing] + [stripped]
data.append(row)
return "\n".join(
" " * (spacing * (len(row) - 1)) + row[-1] for row in sorted(data)
)
您可以按如下方式使用:
text = """foo
bar
inner 1
inner 3
inner 2
another
stuff c
stuff b
stuff a
more
items z
items x
items y"""
print(sort_indented_text(text, 4))