使用 R 在 XML 文件中添加和更改节点

Add and Change nodes in XML file with R

我有一个 XML(OSM) 文件,看起来像这样(小例子):

<way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
    <nd ref="85642"/>
    <nd ref="85641"/>
    <nd ref="86016"/>
    <nd ref="85642"/>
  </way>
  <relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
    <member type="way" ref="2" role="outer"/>
    <member type="way" ref="12" role="outer"/>
    <member type="way" ref="17" role="outer"/>
    <member type="way" ref="22" role="outer"/>
    <member type="way" ref="27" role="outer"/>
    <member type="way" ref="60" role="outer"/>
    <member type="way" ref="65" role="outer"/>
    <member type="way" ref="71" role="outer"/>
    <member type="way" ref="75" role="outer"/>
    <member type="way" ref="79" role="outer"/>
    <member type="way" ref="84" role="outer"/>
    <member type="way" ref="92" role="outer"/>
    <member type="way" ref="108" role="outer"/>
    <member type="way" ref="112" role="outer"/>
    <member type="way" ref="132" role="outer"/>
    <member type="way" ref="150" role="outer"/>
    <member type="way" ref="166" role="outer"/>
    <member type="way" ref="173" role="outer"/>
    <member type="way" ref="178" role="outer"/>
    <tag k="type" v="multipolygon"/>
    <tag k="note" v="00000 ExampleCity"/>
    <tag k="plz" v="00000"/>
  </relation>

我想做的是使用 R 中的 XML package 对文件应用一些更改,尤其是 <relation> 部分。

1) 我想更改 v= 属性

<tag k="type" v="multipolygon"/>

<tag k="type" v="boundary"/>

2) 我想在所有 <relation> 父节点中添加一个新节点

<tag k='boundary' v='postal_code' />

3)更改k=属性部分:

<tag k="note" v="00000 ExampleCity"/>

<tag k="city" v="00000 ExampleCity"/>

好吧,我可以使用以下方法找到所有 <relations>: (doc为文件名)

getNodeSet(doc,"//relation")

或者获取全部tags的全部<realtions>

但我不知道如何实际覆盖和添加我需要的部分。

如前所述,请考虑 XSLT,一种专用声明性编程语言,旨在操纵 XML 文档以满足最终使用需求。虽然 R 不维护全面的 XSLT 处理器,但它可以与其他 languages/software 交互,例如 Python 和 Excel。即使对于后者,R 也可以使用 RDCOMClient 库模仿 Excel 宏:

XSLT 脚本 (另存为外部 .xsl 或 .xslt 文件以供下方使用)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>  

  <!-- IDENTITY TRANSFORM -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- CHANGE @v ATTRIBUTE -->
  <xsl:template match="tag[@k='type']">
    <xsl:copy>      
      <xsl:copy-of select="@k"/>
      <xsl:attribute name="v">boundary</xsl:attribute>
    </xsl:copy>
  </xsl:template>

  <!-- CHANGE @k ATTRIBUTE -->
  <xsl:template match="tag[@k='note']">
    <xsl:copy>      
      <xsl:attribute name="k">city</xsl:attribute>
      <xsl:copy-of select="@v"/>
    </xsl:copy>
  </xsl:template>

  <!-- ADD NODE -->
  <xsl:template match="relation">
    <xsl:copy>      
      <xsl:apply-templates select="@*"/>
      <xsl:apply-templates select="member"/>
      <xsl:apply-templates select="tag"/>
      <tag k='boundary' v='postal_code' />
    </xsl:copy>
  </xsl:template>      
</xsl:transform>

Python 脚本 (使用 lxml 模块)

import lxml.etree as ET

# LOAD ORIGINAL XML AND XSLT SCRIPT
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')

# TRANSFORM XML INTO A NEW DOM OBJECT
transform = ET.XSLT(xslt)
newdom = transform(dom)

# CONVERT TO STRING
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True,  xml_declaration=True)

# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()

R脚本(调用上面的.py脚本,假设python是系统PATH变量)

system('python "C:\Path\To\Python\Script.py"')

或者,Excel 可以 运行 XSLT with R 复制过程。

Excel(使用MSXML对象,此处后期绑定)

Public Sub RunXSLT()
    Dim xmlDoc As Object, xslDoc As Object, newDoc As Object

    Set xmlDoc = CreateObject("MSXML2.DOMDocument")
    Set xslDoc = CreateObject("MSXML2.DOMDocument")
    Set newDoc = CreateObject("MSXML2.DOMDocument")

    xmlDoc.Load "C\Path\To\Input.xml"
    xmlDoc.async = False

    xslDoc.Load "C\Path\To\XSLTScript.xsl"
    xslDoc.async = False
    xmlDoc.transformNodeToObject xslDoc, newDoc
    newDoc.Save "C\Path\To\Output.xml"

    Set newDoc = Nothing
    Set xslDoc = Nothing
    Set xmlDoc = Nothing

End Sub

R脚本(复制上面,使用RDCOMClient)

library(RDCOMClient)

xmlfile = COMCreate("MSXML2.DOMDocument")
xslfile = COMCreate("MSXML2.DOMDocument")
newxmlfile = COMCreate("MSXML2.DOMDocument")

xmlstr = 'C\Path\To\Input.xml'
xslstr = 'C\Path\To\XSLTScript.xsl'
newxmlstr = 'C\Path\To\Output.xml'

# LOADING XML & XSLT FILES
xmlfile.async = FALSE
xmlfile$Load(xmlstr)

xslfile.async = FALSE
xslfile$Load(xslstr)

# TRANSFORMING XML FILE USING XLST INTO NEW FILE
xmlfile$transformNodeToObject(xslfile, newxmlfile)
newxmlfile$Save(newxmlstr)

最终XML输出

<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="CGImap 0.0.2">
  <node id="298884272" lat="54.0901447" lon="12.2516513" user="SvenHRO" 
        uid="46882" visible="true" version="1" changeset="676636" 
        timestamp="2008-09-21T21:37:45Z"/>
  <way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
    <nd ref="85642"/>
    <nd ref="85641"/>
    <nd ref="86016"/>
    <nd ref="85642"/>
  </way>
  <relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
    <member type="way" ref="2" role="outer"/>
    <member type="way" ref="12" role="outer"/>
    <member type="way" ref="17" role="outer"/>
    <member type="way" ref="22" role="outer"/>
    <member type="way" ref="27" role="outer"/>
    <member type="way" ref="60" role="outer"/>
    <member type="way" ref="65" role="outer"/>
    <member type="way" ref="71" role="outer"/>
    <member type="way" ref="75" role="outer"/>
    <member type="way" ref="79" role="outer"/>
    <member type="way" ref="84" role="outer"/>
    <member type="way" ref="92" role="outer"/>
    <member type="way" ref="108" role="outer"/>
    <member type="way" ref="112" role="outer"/>
    <member type="way" ref="132" role="outer"/>
    <member type="way" ref="150" role="outer"/>
    <member type="way" ref="166" role="outer"/>
    <member type="way" ref="173" role="outer"/>
    <member type="way" ref="178" role="outer"/>
    <tag k="type" v="boundary"/>
    <tag k="city" v="00000 ExampleCity"/>
    <tag k="plz" v="00000"/>
    <tag k="boundary" v="postal_code"/>
  </relation>
</osm>