如何使用 PyYAML 读取 python 元组?
How to read a python tuple using PyYAML?
我有以下名为 input.yaml
的 YAML 文件:
cities:
1: [0,0]
2: [4,0]
3: [0,4]
4: [4,4]
5: [2,2]
6: [6,2]
highways:
- [1,2]
- [1,3]
- [1,5]
- [2,4]
- [3,4]
- [5,4]
start: 1
end: 4
我正在使用 PyYAML 加载它并按如下方式打印结果:
import yaml
f = open("input.yaml", "r")
data = yaml.load(f)
f.close()
print(data)
结果是如下数据结构:
{ 'cities': { 1: [0, 0]
, 2: [4, 0]
, 3: [0, 4]
, 4: [4, 4]
, 5: [2, 2]
, 6: [6, 2]
}
, 'highways': [ [1, 2]
, [1, 3]
, [1, 5]
, [2, 4]
, [3, 4]
, [5, 4]
]
, 'start': 1
, 'end': 4
}
如您所见,每个城市和高速公路都表示为一个列表。但是,我希望将它们表示为一个元组。因此,我使用理解手动将它们转换为元组:
import yaml
f = open("input.yaml", "r")
data = yaml.load(f)
f.close()
data["cities"] = {k: tuple(v) for k, v in data["cities"].items()}
data["highways"] = [tuple(v) for v in data["highways"]]
print(data)
然而,这似乎是一个 hack。有什么方法可以指示 PyYAML 直接将它们读取为元组而不是列表?
我不会因为您正在尝试做的事情而将您所做的事情称为 hacky。根据我的理解,您的替代方法是在 YAML 文件中使用 python-specific 标签,以便在加载 yaml 文件时适当地表示它。然而,这需要你修改你的 yaml 文件,如果它很大,可能会非常烦人而且不理想。
查看进一步说明这一点的 PyYaml doc。最终你想在你想要这样表示的结构前面放置一个 !!python/tuple
。要获取您的示例数据,它需要:
YAML 文件:
cities:
1: !!python/tuple [0,0]
2: !!python/tuple [4,0]
3: !!python/tuple [0,4]
4: !!python/tuple [4,4]
5: !!python/tuple [2,2]
6: !!python/tuple [6,2]
highways:
- !!python/tuple [1,2]
- !!python/tuple [1,3]
- !!python/tuple [1,5]
- !!python/tuple [2,4]
- !!python/tuple [3,4]
- !!python/tuple [5,4]
start: 1
end: 4
示例代码:
import yaml
with open('y.yaml') as f:
d = yaml.load(f.read())
print(d)
这将输出:
{'cities': {1: (0, 0), 2: (4, 0), 3: (0, 4), 4: (4, 4), 5: (2, 2), 6: (6, 2)}, 'start': 1, 'end': 4, 'highways': [(1, 2), (1, 3), (1, 5), (2, 4), (3, 4), (5, 4)]}
根据您的 YAML 输入来自何处,您的 "hack" 是一个很好的解决方案,特别是如果您使用 yaml.safe_load()
而不是不安全的 yaml.load()
。如果您的 YAML 文件中只有 "leaf" 序列需要是元组,您可以执行 ¹:
import pprint
import ruamel.yaml
from ruamel.yaml.constructor import SafeConstructor
def construct_yaml_tuple(self, node):
seq = self.construct_sequence(node)
# only make "leaf sequences" into tuples, you can add dict
# and other types as necessary
if seq and isinstance(seq[0], (list, tuple)):
return seq
return tuple(seq)
SafeConstructor.add_constructor(
u'tag:yaml.org,2002:seq',
construct_yaml_tuple)
with open('input.yaml') as fp:
data = ruamel.yaml.safe_load(fp)
pprint.pprint(data, width=24)
打印:
{'cities': {1: (0, 0),
2: (4, 0),
3: (0, 4),
4: (4, 4),
5: (2, 2),
6: (6, 2)},
'end': 4,
'highways': [(1, 2),
(1, 3),
(1, 5),
(2, 4),
(3, 4),
(5, 4)],
'start': 1}
如果您随后需要处理更多 material 序列需要再次列出 "normal",请使用:
SafeConstructor.add_constructor(
u'tag:yaml.org,2002:seq',
SafeConstructor.construct_yaml_seq)
¹ 这是使用 ruamel.yaml YAML 1.2 解析器完成的,我是其中的作者。如果您只需要支持 YAML 1.1 and/or 由于某种原因无法升级
,您应该能够对较旧的 PyYAML 执行相同的操作
我运行和问题一样的问题,我对两个答案都不太满意。在浏览我发现的 pyyaml 文档时
确实有两个有趣的方法 yaml.add_constructor
和 yaml.add_implicit_resolver
。
隐式解析器通过将字符串与正则表达式匹配,解决了必须用 !!python/tuple
标记所有条目的问题。我还想使用元组语法,所以写 tuple: (10,120)
而不是写一个列表 tuple: [10,120]
然后得到
转换为元组,我个人觉得很烦人。我也不想安装外部库。这是代码:
import yaml
import re
# this is to convert the string written as a tuple into a python tuple
def yml_tuple_constructor(loader, node):
# this little parse is really just for what I needed, feel free to change it!
def parse_tup_el(el):
# try to convert into int or float else keep the string
if el.isdigit():
return int(el)
try:
return float(el)
except ValueError:
return el
value = loader.construct_scalar(node)
# remove the ( ) from the string
tup_elements = value[1:-1].split(',')
# remove the last element if the tuple was written as (x,b,)
if tup_elements[-1] == '':
tup_elements.pop(-1)
tup = tuple(map(parse_tup_el, tup_elements))
return tup
# !tuple is my own tag name, I think you could choose anything you want
yaml.add_constructor(u'!tuple', yml_tuple_constructor)
# this is to spot the strings written as tuple in the yaml
yaml.add_implicit_resolver(u'!tuple', re.compile(r"\(([^,\W]{,},){,}[^,\W]*\)"))
最后执行这个:
>>> yml = yaml.load("""
...: cities:
...: 1: (0,0)
...: 2: (4,0)
...: 3: (0,4)
...: 4: (4,4)
...: 5: (2,2)
...: 6: (6,2)
...: highways:
...: - (1,2)
...: - (1,3)
...: - (1,5)
...: - (2,4)
...: - (3,4)
...: - (5,4)
...: start: 1
...: end: 4""")
>>> yml['cities']
{1: (0, 0), 2: (4, 0), 3: (0, 4), 4: (4, 4), 5: (2, 2), 6: (6, 2)}
>>> yml['highways']
[(1, 2), (1, 3), (1, 5), (2, 4), (3, 4), (5, 4)]
与我未测试的 load
相比,save_load
可能存在潜在缺点。
我有以下名为 input.yaml
的 YAML 文件:
cities:
1: [0,0]
2: [4,0]
3: [0,4]
4: [4,4]
5: [2,2]
6: [6,2]
highways:
- [1,2]
- [1,3]
- [1,5]
- [2,4]
- [3,4]
- [5,4]
start: 1
end: 4
我正在使用 PyYAML 加载它并按如下方式打印结果:
import yaml
f = open("input.yaml", "r")
data = yaml.load(f)
f.close()
print(data)
结果是如下数据结构:
{ 'cities': { 1: [0, 0]
, 2: [4, 0]
, 3: [0, 4]
, 4: [4, 4]
, 5: [2, 2]
, 6: [6, 2]
}
, 'highways': [ [1, 2]
, [1, 3]
, [1, 5]
, [2, 4]
, [3, 4]
, [5, 4]
]
, 'start': 1
, 'end': 4
}
如您所见,每个城市和高速公路都表示为一个列表。但是,我希望将它们表示为一个元组。因此,我使用理解手动将它们转换为元组:
import yaml
f = open("input.yaml", "r")
data = yaml.load(f)
f.close()
data["cities"] = {k: tuple(v) for k, v in data["cities"].items()}
data["highways"] = [tuple(v) for v in data["highways"]]
print(data)
然而,这似乎是一个 hack。有什么方法可以指示 PyYAML 直接将它们读取为元组而不是列表?
我不会因为您正在尝试做的事情而将您所做的事情称为 hacky。根据我的理解,您的替代方法是在 YAML 文件中使用 python-specific 标签,以便在加载 yaml 文件时适当地表示它。然而,这需要你修改你的 yaml 文件,如果它很大,可能会非常烦人而且不理想。
查看进一步说明这一点的 PyYaml doc。最终你想在你想要这样表示的结构前面放置一个 !!python/tuple
。要获取您的示例数据,它需要:
YAML 文件:
cities:
1: !!python/tuple [0,0]
2: !!python/tuple [4,0]
3: !!python/tuple [0,4]
4: !!python/tuple [4,4]
5: !!python/tuple [2,2]
6: !!python/tuple [6,2]
highways:
- !!python/tuple [1,2]
- !!python/tuple [1,3]
- !!python/tuple [1,5]
- !!python/tuple [2,4]
- !!python/tuple [3,4]
- !!python/tuple [5,4]
start: 1
end: 4
示例代码:
import yaml
with open('y.yaml') as f:
d = yaml.load(f.read())
print(d)
这将输出:
{'cities': {1: (0, 0), 2: (4, 0), 3: (0, 4), 4: (4, 4), 5: (2, 2), 6: (6, 2)}, 'start': 1, 'end': 4, 'highways': [(1, 2), (1, 3), (1, 5), (2, 4), (3, 4), (5, 4)]}
根据您的 YAML 输入来自何处,您的 "hack" 是一个很好的解决方案,特别是如果您使用 yaml.safe_load()
而不是不安全的 yaml.load()
。如果您的 YAML 文件中只有 "leaf" 序列需要是元组,您可以执行 ¹:
import pprint
import ruamel.yaml
from ruamel.yaml.constructor import SafeConstructor
def construct_yaml_tuple(self, node):
seq = self.construct_sequence(node)
# only make "leaf sequences" into tuples, you can add dict
# and other types as necessary
if seq and isinstance(seq[0], (list, tuple)):
return seq
return tuple(seq)
SafeConstructor.add_constructor(
u'tag:yaml.org,2002:seq',
construct_yaml_tuple)
with open('input.yaml') as fp:
data = ruamel.yaml.safe_load(fp)
pprint.pprint(data, width=24)
打印:
{'cities': {1: (0, 0),
2: (4, 0),
3: (0, 4),
4: (4, 4),
5: (2, 2),
6: (6, 2)},
'end': 4,
'highways': [(1, 2),
(1, 3),
(1, 5),
(2, 4),
(3, 4),
(5, 4)],
'start': 1}
如果您随后需要处理更多 material 序列需要再次列出 "normal",请使用:
SafeConstructor.add_constructor(
u'tag:yaml.org,2002:seq',
SafeConstructor.construct_yaml_seq)
¹ 这是使用 ruamel.yaml YAML 1.2 解析器完成的,我是其中的作者。如果您只需要支持 YAML 1.1 and/or 由于某种原因无法升级
,您应该能够对较旧的 PyYAML 执行相同的操作我运行和问题一样的问题,我对两个答案都不太满意。在浏览我发现的 pyyaml 文档时
确实有两个有趣的方法 yaml.add_constructor
和 yaml.add_implicit_resolver
。
隐式解析器通过将字符串与正则表达式匹配,解决了必须用 !!python/tuple
标记所有条目的问题。我还想使用元组语法,所以写 tuple: (10,120)
而不是写一个列表 tuple: [10,120]
然后得到
转换为元组,我个人觉得很烦人。我也不想安装外部库。这是代码:
import yaml
import re
# this is to convert the string written as a tuple into a python tuple
def yml_tuple_constructor(loader, node):
# this little parse is really just for what I needed, feel free to change it!
def parse_tup_el(el):
# try to convert into int or float else keep the string
if el.isdigit():
return int(el)
try:
return float(el)
except ValueError:
return el
value = loader.construct_scalar(node)
# remove the ( ) from the string
tup_elements = value[1:-1].split(',')
# remove the last element if the tuple was written as (x,b,)
if tup_elements[-1] == '':
tup_elements.pop(-1)
tup = tuple(map(parse_tup_el, tup_elements))
return tup
# !tuple is my own tag name, I think you could choose anything you want
yaml.add_constructor(u'!tuple', yml_tuple_constructor)
# this is to spot the strings written as tuple in the yaml
yaml.add_implicit_resolver(u'!tuple', re.compile(r"\(([^,\W]{,},){,}[^,\W]*\)"))
最后执行这个:
>>> yml = yaml.load("""
...: cities:
...: 1: (0,0)
...: 2: (4,0)
...: 3: (0,4)
...: 4: (4,4)
...: 5: (2,2)
...: 6: (6,2)
...: highways:
...: - (1,2)
...: - (1,3)
...: - (1,5)
...: - (2,4)
...: - (3,4)
...: - (5,4)
...: start: 1
...: end: 4""")
>>> yml['cities']
{1: (0, 0), 2: (4, 0), 3: (0, 4), 4: (4, 4), 5: (2, 2), 6: (6, 2)}
>>> yml['highways']
[(1, 2), (1, 3), (1, 5), (2, 4), (3, 4), (5, 4)]
与我未测试的 load
相比,save_load
可能存在潜在缺点。