当默认加载器已经变得更安全时,为什么 PyYAML 5.1 会引发 YAMLLoadWarning?

Why does PyYAML 5.1 raise YAMLLoadWarning when the default loader has been made safer already?

这是我的代码:

import yaml
yaml.load('foo')

此代码导致 PyYAML (5.1) 出现以下警告。

$ pip install pyyaml
$ python3 foo.py
foo.py:2: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  yaml.load('foo')

所以我访问了 https://msg.pyyaml.org/load 以查看这是关于什么的,但我不明白此警告的必要性。

首先,文档说,

UnsafeLoader (also called Loader for backwards compatability)

The original Loader code that could be easily exploitable by untrusted data input.

好的,有道理。在早期版本中,原始加载程序是不安全的。此外,它说,

FullLoader

Loads the full YAML language. Avoids arbitrary code execution. This is currently (PyYAML 5.1) the default loader called by yaml.load(input) (after issuing the warning).

所以现在的版本使用的是FullLoader,并不是不安全的。这在文档中再次得到证实。

The load function was also made much safer by disallowing the execution of arbitrary functions by the default loader (FullLoader).

如果使用 FullLoader 的当前版本并非不安全,那我们为什么还需要 YAMLLoadWarning

我认为这个警告更像是一个通知和指导,让用户知道什么是未来的 PyYAML 最佳实践。回想一下:显式优于隐式。


在 5.1 版本之前(例如 4.1),yaml.load api 默认使用 Loader=Loader

def load(stream, Loader=Loader):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    """
    loader = Loader(stream)
    try:
        return loader.get_single_data()
    finally:
        loader.dispose()

def safe_load(stream):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    Resolve only basic YAML tags.
    """
    return load(stream, SafeLoader)

当时Loader class只有三种选择:有限的BaseLoaderSafeLoader和不安全的Loader。虽然默认的是不安全的,就像我们从文档中读到的那样:

PyYAML's load function has been unsafe since the first release in May 2006. It has always been documented that way in bold type: PyYAMLDocumentation. PyYAML has always provided a safe_load function that can load a subset of YAML without exploit.

但是仍然有很多资源和教程更喜欢直接使用 yaml.load(f),所以用户(尤其是新用户)选择默认加载器 class .


并且自 PyYAML 版本 5.1 起,yaml.load api 更改为更 显式 :

def load(stream, Loader=None):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    """
    if Loader is None:
        load_warning('load')
        Loader = FullLoader

    loader = Loader(stream)
    try:
        return loader.get_single_data()
    finally:
        loader.dispose()

def safe_load(stream):
    """
    Parse the first YAML document in a stream
    and produce the corresponding Python object.
    Resolve only basic YAML tags. This is known
    to be safe for untrusted input.
    """
    return load(stream, SafeLoader)

并在 Loader classes 中添加了一个新的 FullLoader。作为用户,我们也应该意识到这些变化,多使用yaml.load 显式:

  • yaml.load(stream, yaml.SafeLoader)

    推荐用于不受信任的输入。限制:加载 YAML 语言的子集。

  • yaml.load(stream, yaml.FullLoader)

    为了更可信的输入。还有一点限制:避免任意代码执行。

  • yaml.load(stream, yaml.Loader)UnsafeLoader等同于Loader

    不安全。却有着十足的力量。