将多个 yaml 文件与 python 中的缩进组合

Question

我有以下目录和文件结构。我想从以下文件夹中的所有 yaml 文件中创建一个 yaml 文件。

[root@localhost test]# tree
.
├── group_vars
│   └── all.template
├── host_vars
│   └── host.template
└── vars
    ├── CASSANDRA
    ├── CQLSH
    ├── CSYNC2
    ├── DSE_OPSCENTER
    ├── DSE_OPSCENTER_AGENT
    ├── logging.template
    ├── packages_vars.template
    ├── UDM
    └── user_pub_keys

结果可能是（示例 yaml 文件）

group_vars/all.template:
    <all the all.template data at this indentation>
host_vars/host.template:
    <all the host.template data at this indentation>
vars/CASSANDRA:
    <all the CASSANDRA data at this indentation>
vars/CQLSH:
    <all the CQLSH data at this indentation>
... so  on

我可以在文件夹中加入这些文件，但我不知道如何使用我上面描述的 yaml 格式。

我尝试了什么？

想到了写入文件<folder_name>/file_name>然后给4个空格，原样写入内容。

如下所示

with open(actual_path) as i: # actual path is just the path to the file
    outfile.write('vars#'+fname) # vars is the folder name and fname is the file name. # is just any separator for the file
    outfile.write(i.read()) # here I can add 4 spaces 
    outfile.write('\n')

这是按照我想要的方式创建 yaml 文件的好方法吗？如果是这样，我只需要知道如何在 4 个空格后开始写入文件（按原样）。

Answer 1

将有效的 YAML 文件作为输出的更好选择可能是使用 PyYAML，因此您可以读取所有 YAML 文件，将它们合并到内存中，然后将生成的对象转储到新文件中。

Answer 2

您不能只将文件的内容转储到一个（或多个）YAML 文档中，因为加载该内容时会对其进行解析。此类解析的内容可能是不正确的 YAML，导致加载程序错误，或者它可能是正确的 YAML，导致数据结构不太可能完全转换为读取的原始文件的（字符串）内容。后者是因为 YAML 转储器规范化缩进，并且大多数转储器处理行尾注释。

文件也可以包含二进制数据，需要对其进行正确编码，或者根据文件的内容进行转义。

然后，如果任何路径或文件的名称中包含有效的文件名字符，则将 #（如在您的代码中）作为路径元素和文件名之间的分隔符的方法将不起作用.您应该使用保留字符（在 Unix 上喜欢 NUL 字符或 /），或者通过将路径分成多个段并将这些段加上文件名放在字符串标量序列中来使事情更易于传输。

为了使所有这些都成为正确的 YAML，请确保将此信息加载到数据结构中，然后使用 YAML loader/dumper 库转储该数据结构，而不是尝试自己编写文件。对于 Python，您唯一真正的选择是 ruamel.yaml（免责声明：我是该软件包的作者），例如较旧的 PyYAML 无法将序列转储为映射的键，尽管根据 YAML 规范这是完全有效的。

创建数据结构的方法有多种，您还需要决定您的文件是包含一个 YAML 文档还是多个。如果您想要一个文档，我会将由 / 分隔的路径+文件名表示为映射的键，并将文字块标量形式的文件内容作为这些键的值：

import os
import sys
import ruamel.yaml

root_dir = '.'

data = ruamel.yaml.comments.CommentedMap()

for root, directory_names, file_names in os.walk(root_dir):
    if root == root_dir:
        # don't do the file in the current directory, only the ones in subdirs
        continue
    # this makes a list after removing the root_dir
    rsplit = root.replace(root_dir + os.sep, '', 1).split(os.sep)
    for file_name in file_names:
        # open as binary
        with open(os.path.join(root, file_name), 'rb') as fp:
            raw_content = fp.read()
        # then if conversion to unicode fails, keep as binary
        try:
            content = ruamel.yaml.scalarstring.PreservedScalarString(raw_content.decode('utf-8'))
        except UnicodeDecodeError:
            content = raw_content
        # in the next line join the segments using '/', don't use os.sep, as you might
        # not be on Unix/Linux
        data['/'.join(rsplit + [file_name])] = content

yaml = ruamel.yaml.YAML()
yaml.dump(data, sys.stdout)

给出：

host_vars/host.template: |+
  this is the content of the file 
  host.template it has two empty lines at the end


group_vars/all.template: |
  this is the content of the
  file all.template
vars/CASSANDRA: !!binary |
  jA0EAwMCeujy0iby+oFgyUiXUDg2VWaphMZSwDxIIyo0h/aVkrmVaRJy7DFjLhfNrKZL9wRiztvL
  slM0cA/N1jDZ2DJCT5317mlTNuWZCoj/8EzvPegpi7w=

除了程序中的注释外还有几点需要注意：

CASSANDRA 文件是故意制作的二进制文件（一种 gpg 编码，文件名作为文件内容的多次密钥 all.template）。无需执行任何特殊操作即可获得 !!binary 标签。
host.template 末尾有两个空行，因此 YAML 会自动将其转储为 |+
在读取文件时，如果可移植性很重要，请确保通过使用 / 拆分然后使用 os.sep.

将多个 yaml 文件与 python 中的缩进组合

combine multiple yaml files with indentation in python

python

yaml

ruamel.yaml