将字典列表转换为 rdf 格式

Converting a list of dictionaries, into rdf format

目标:(自动化:当有大量字典时,我想生成特定格式的数据) 这是输入:

a = ['et2': 'OBJ Type',
  'e2': 'OBJ',
  'rel': 'rel',
  'et1': 'SUJ Type',
  'e1': 'SUJ'},
     {'et2': 'OBJ Type 2',
  'e2': 'OBJ',
  'rel': 'rel',
  'et1': 'SUJ Type',
  'e1': 'SUJ'}
  ]

预期的输出是这样的:

:Sub a :SubType.
:Sub :rel "Obj".

 

这是我试过的

Sub = 0


for i in a:
    entity_type1 = i["EntityType1"]
    entity1 = i["Entity1"]
    entity_type2 = i["EntityType2"]
    entity2 = i["Entity2"]
    relation = i["Relation"]
    if 'Sub' in entity_type1 or entity_type2:
        if entity1 == Sub and Sub <= 0 :
            
            Sub +=1
            sd_line1 = ""
            sd_line2 = ""
            sd_line1 = ":" + entity1 + " a " + ":" + entity_type1 + "."
            relation = ":"+relation
            sd_line2 ="\n"  ":" + entity1 + " " + relation + " \"" + entity2 + "\"."
            sd_line3 = sd_line1 + sd_line2
            print(sd_line3)


        
      

一点建议:在做这样的转换工作流程时,尽量将主要步骤分开,例如:从系统加载解析一种格式的数据,提取转换序列化到另一种格式,正在加载到另一个系统。

在您的代码示例中,您混合了提取、转换和序列化步骤。分离这些步骤将使您的代码更易于阅读,从而更易于维护或重用。

下面,我给你两个解决方案:第一个是将数据提取到一个简单的基于 dictsubject-predicate-object 图,第二个是一个真正的 RDF 图。

在这两种情况下,您会看到我将 extraction/transformation 步骤(returns 图)和序列化步骤(使用图)分开,使它们更易于重用:

  • 基于 dict 的转换是通过简单的 dictdefaultdict 实现的。序列化步骤对两者都是通用的。

  • 基于 rdflib.Graph 的转换对于两种序列化是通用的:一种针对您的格式,另一种针对任何可用的 rdflib.Graph 序列化。


这将从您的 a 字典构建一个简单的基于 dict 的图表:

graph = {}

for e in a:
    subj = e["Entity1"]
    graph[subj] = {}

    # :Entity1 a :EntityType1.
    obj = e["EntityType1"]
    graph[subj]["a"] = obj  

    # :Entity1 :Relation "Entity2".    
    pred, obj = e["Relation"], e["Entity2"]
    graph[subj][pred] = obj  

print(graph)

像这样:

{'X450-G2': {'a': 'switch',
             'hasFeatures': 'Role-Based Policy',
             'hasLocation': 'WallJack'},
 'ers 3600': {'a': 'switch', 
              'hasFeatures': 'ExtremeXOS'},
 'slx 9540': {'a': 'router',
              'hasFeatures': 'ExtremeXOS',
              'hasLocation': 'Chasis'}})

或者,以更短的形式,使用 defaultdict:

from collections import defaultdict

graph = defaultdict(dict)

for e in a:
    subj = e["Entity1"]
    
    # :Entity1 a :EntityType1.
    graph[subj]["a"] = e["EntityType1"]  

    # :Entity1 :Relation "Entity2".    
    graph[subj][e["Relation"]] = e["Entity2"]  

print(graph)

这将从图中打印出您的 subject predicate object. 三元组:

def normalize(text):
    return text.replace(' ', '')

for subj, po in graph.items():
    subj = normalize(subj)

    # :Entity1 a :EntityType1.
    print(':{} a :{}.'.format(subj, po.pop("a")))

    for pred, obj in po.items():
        # :Entity1 :Relation "Entity2".    
        print(':{} :{} "{}".'.format(subj, pred, obj))

    print()

像这样:

:X450-G2 a :switch.
:X450-G2 :hasFeatures "Role-Based Policy".
:X450-G2 :hasLocation "WallJack".

:ers3600 a :switch.
:ers3600 :hasFeatures "ExtremeXOS".

:slx9540 a :router.
:slx9540 :hasFeatures "ExtremeXOS".
:slx9540 :hasLocation "Chasis".

这将使用 rdflib 库构建一个真正的 RDF 图:

from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF

A = RDF.type
graph = Graph()

for d in a:
   subj = URIRef(normalize(d["Entity1"]))

    # :Entity1 a :EntityType1.
    graph.add((
        subj,
        A, 
        URIRef(normalize(d["EntityType1"]))
    ))
    
    # :Entity1 :Relation "Entity2".    
    graph.add((
        subj,
        URIRef(normalize(d["Relation"])), 
        Literal(d["Entity2"])
    ))

这个:

print(graph.serialize(format="n3").decode("utf-8"))

将以 N3 序列化格式打印图形:

<X450-G2> a <switch> ;
    <hasFeatures> "Role-Based Policy" ;
    <hasLocation> "WallJack" .

<ers3600> a <switch> ;
    <hasFeatures> "ExtremeXOS" .

<slx9540> a <router> ;
    <hasFeatures> "ExtremeXOS" ;
    <hasLocation> "Chasis" .

这将查询图形以您的格式打印:

for subj in set(graph.subjects()):
    po = dict(graph.predicate_objects(subj))
    
    # :Entity1 a :EntityType1.
    print(":{} a :{}.".format(subj, po.pop(A)))
    
    for pred, obj in po.items():
        # :Entity1 :Relation "Entity2".    
        print(':{} :{} "{}".'.format(subj, pred, obj))
    print()