将字典列表转换为 rdf 格式
Converting a list of dictionaries, into rdf format
目标:(自动化:当有大量字典时,我想生成特定格式的数据)
这是输入:
a = ['et2': 'OBJ Type',
'e2': 'OBJ',
'rel': 'rel',
'et1': 'SUJ Type',
'e1': 'SUJ'},
{'et2': 'OBJ Type 2',
'e2': 'OBJ',
'rel': 'rel',
'et1': 'SUJ Type',
'e1': 'SUJ'}
]
预期的输出是这样的:
:Sub a :SubType.
:Sub :rel "Obj".
这是我试过的
Sub = 0
for i in a:
entity_type1 = i["EntityType1"]
entity1 = i["Entity1"]
entity_type2 = i["EntityType2"]
entity2 = i["Entity2"]
relation = i["Relation"]
if 'Sub' in entity_type1 or entity_type2:
if entity1 == Sub and Sub <= 0 :
Sub +=1
sd_line1 = ""
sd_line2 = ""
sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
relation = ":"+relation
sd_line2 ="\n" ":" + entity1 + " " + relation + " \"" + entity2 + "\"."
sd_line3 = sd_line1 + sd_line2
print(sd_line3)
一点建议:在做这样的转换工作流程时,尽量将主要步骤分开,例如:从系统加载,解析一种格式的数据,提取,转换,序列化到另一种格式,正在加载到另一个系统。
在您的代码示例中,您混合了提取、转换和序列化步骤。分离这些步骤将使您的代码更易于阅读,从而更易于维护或重用。
下面,我给你两个解决方案:第一个是将数据提取到一个简单的基于 dict
的 subject-predicate-object
图,第二个是一个真正的 RDF 图。
在这两种情况下,您会看到我将 extraction/transformation 步骤(returns 图)和序列化步骤(使用图)分开,使它们更易于重用:
基于 dict
的转换是通过简单的 dict
或 defaultdict
实现的。序列化步骤对两者都是通用的。
基于 rdflib.Graph
的转换对于两种序列化是通用的:一种针对您的格式,另一种针对任何可用的 rdflib.Graph
序列化。
这将从您的 a
字典构建一个简单的基于 dict
的图表:
graph = {}
for e in a:
subj = e["Entity1"]
graph[subj] = {}
# :Entity1 a :EntityType1.
obj = e["EntityType1"]
graph[subj]["a"] = obj
# :Entity1 :Relation "Entity2".
pred, obj = e["Relation"], e["Entity2"]
graph[subj][pred] = obj
print(graph)
像这样:
{'X450-G2': {'a': 'switch',
'hasFeatures': 'Role-Based Policy',
'hasLocation': 'WallJack'},
'ers 3600': {'a': 'switch',
'hasFeatures': 'ExtremeXOS'},
'slx 9540': {'a': 'router',
'hasFeatures': 'ExtremeXOS',
'hasLocation': 'Chasis'}})
或者,以更短的形式,使用 defaultdict
:
from collections import defaultdict
graph = defaultdict(dict)
for e in a:
subj = e["Entity1"]
# :Entity1 a :EntityType1.
graph[subj]["a"] = e["EntityType1"]
# :Entity1 :Relation "Entity2".
graph[subj][e["Relation"]] = e["Entity2"]
print(graph)
这将从图中打印出您的 subject predicate object.
三元组:
def normalize(text):
return text.replace(' ', '')
for subj, po in graph.items():
subj = normalize(subj)
# :Entity1 a :EntityType1.
print(':{} a :{}.'.format(subj, po.pop("a")))
for pred, obj in po.items():
# :Entity1 :Relation "Entity2".
print(':{} :{} "{}".'.format(subj, pred, obj))
print()
像这样:
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".
:slx9540 a :router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".
这将使用 rdflib
库构建一个真正的 RDF 图:
from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF
A = RDF.type
graph = Graph()
for d in a:
subj = URIRef(normalize(d["Entity1"]))
# :Entity1 a :EntityType1.
graph.add((
subj,
A,
URIRef(normalize(d["EntityType1"]))
))
# :Entity1 :Relation "Entity2".
graph.add((
subj,
URIRef(normalize(d["Relation"])),
Literal(d["Entity2"])
))
这个:
print(graph.serialize(format="n3").decode("utf-8"))
将以 N3
序列化格式打印图形:
<X450-G2> a <switch> ;
<hasFeatures> "Role-Based Policy" ;
<hasLocation> "WallJack" .
<ers3600> a <switch> ;
<hasFeatures> "ExtremeXOS" .
<slx9540> a <router> ;
<hasFeatures> "ExtremeXOS" ;
<hasLocation> "Chasis" .
这将查询图形以您的格式打印:
for subj in set(graph.subjects()):
po = dict(graph.predicate_objects(subj))
# :Entity1 a :EntityType1.
print(":{} a :{}.".format(subj, po.pop(A)))
for pred, obj in po.items():
# :Entity1 :Relation "Entity2".
print(':{} :{} "{}".'.format(subj, pred, obj))
print()
目标:(自动化:当有大量字典时,我想生成特定格式的数据) 这是输入:
a = ['et2': 'OBJ Type',
'e2': 'OBJ',
'rel': 'rel',
'et1': 'SUJ Type',
'e1': 'SUJ'},
{'et2': 'OBJ Type 2',
'e2': 'OBJ',
'rel': 'rel',
'et1': 'SUJ Type',
'e1': 'SUJ'}
]
预期的输出是这样的:
:Sub a :SubType.
:Sub :rel "Obj".
这是我试过的
Sub = 0
for i in a:
entity_type1 = i["EntityType1"]
entity1 = i["Entity1"]
entity_type2 = i["EntityType2"]
entity2 = i["Entity2"]
relation = i["Relation"]
if 'Sub' in entity_type1 or entity_type2:
if entity1 == Sub and Sub <= 0 :
Sub +=1
sd_line1 = ""
sd_line2 = ""
sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
relation = ":"+relation
sd_line2 ="\n" ":" + entity1 + " " + relation + " \"" + entity2 + "\"."
sd_line3 = sd_line1 + sd_line2
print(sd_line3)
一点建议:在做这样的转换工作流程时,尽量将主要步骤分开,例如:从系统加载,解析一种格式的数据,提取,转换,序列化到另一种格式,正在加载到另一个系统。
在您的代码示例中,您混合了提取、转换和序列化步骤。分离这些步骤将使您的代码更易于阅读,从而更易于维护或重用。
下面,我给你两个解决方案:第一个是将数据提取到一个简单的基于 dict
的 subject-predicate-object
图,第二个是一个真正的 RDF 图。
在这两种情况下,您会看到我将 extraction/transformation 步骤(returns 图)和序列化步骤(使用图)分开,使它们更易于重用:
基于
dict
的转换是通过简单的dict
或defaultdict
实现的。序列化步骤对两者都是通用的。基于
rdflib.Graph
的转换对于两种序列化是通用的:一种针对您的格式,另一种针对任何可用的rdflib.Graph
序列化。
这将从您的 a
字典构建一个简单的基于 dict
的图表:
graph = {}
for e in a:
subj = e["Entity1"]
graph[subj] = {}
# :Entity1 a :EntityType1.
obj = e["EntityType1"]
graph[subj]["a"] = obj
# :Entity1 :Relation "Entity2".
pred, obj = e["Relation"], e["Entity2"]
graph[subj][pred] = obj
print(graph)
像这样:
{'X450-G2': {'a': 'switch',
'hasFeatures': 'Role-Based Policy',
'hasLocation': 'WallJack'},
'ers 3600': {'a': 'switch',
'hasFeatures': 'ExtremeXOS'},
'slx 9540': {'a': 'router',
'hasFeatures': 'ExtremeXOS',
'hasLocation': 'Chasis'}})
或者,以更短的形式,使用 defaultdict
:
from collections import defaultdict
graph = defaultdict(dict)
for e in a:
subj = e["Entity1"]
# :Entity1 a :EntityType1.
graph[subj]["a"] = e["EntityType1"]
# :Entity1 :Relation "Entity2".
graph[subj][e["Relation"]] = e["Entity2"]
print(graph)
这将从图中打印出您的 subject predicate object.
三元组:
def normalize(text):
return text.replace(' ', '')
for subj, po in graph.items():
subj = normalize(subj)
# :Entity1 a :EntityType1.
print(':{} a :{}.'.format(subj, po.pop("a")))
for pred, obj in po.items():
# :Entity1 :Relation "Entity2".
print(':{} :{} "{}".'.format(subj, pred, obj))
print()
像这样:
:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".
:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".
:slx9540 a :router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".
这将使用 rdflib
库构建一个真正的 RDF 图:
from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF
A = RDF.type
graph = Graph()
for d in a:
subj = URIRef(normalize(d["Entity1"]))
# :Entity1 a :EntityType1.
graph.add((
subj,
A,
URIRef(normalize(d["EntityType1"]))
))
# :Entity1 :Relation "Entity2".
graph.add((
subj,
URIRef(normalize(d["Relation"])),
Literal(d["Entity2"])
))
这个:
print(graph.serialize(format="n3").decode("utf-8"))
将以 N3
序列化格式打印图形:
<X450-G2> a <switch> ;
<hasFeatures> "Role-Based Policy" ;
<hasLocation> "WallJack" .
<ers3600> a <switch> ;
<hasFeatures> "ExtremeXOS" .
<slx9540> a <router> ;
<hasFeatures> "ExtremeXOS" ;
<hasLocation> "Chasis" .
这将查询图形以您的格式打印:
for subj in set(graph.subjects()):
po = dict(graph.predicate_objects(subj))
# :Entity1 a :EntityType1.
print(":{} a :{}.".format(subj, po.pop(A)))
for pred, obj in po.items():
# :Entity1 :Relation "Entity2".
print(':{} :{} "{}".'.format(subj, pred, obj))
print()