使用 R 在 XML 文件中添加和更改节点
Add and Change nodes in XML file with R
我有一个 XML(OSM) 文件,看起来像这样(小例子):
<way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
<nd ref="85642"/>
<nd ref="85641"/>
<nd ref="86016"/>
<nd ref="85642"/>
</way>
<relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
<member type="way" ref="2" role="outer"/>
<member type="way" ref="12" role="outer"/>
<member type="way" ref="17" role="outer"/>
<member type="way" ref="22" role="outer"/>
<member type="way" ref="27" role="outer"/>
<member type="way" ref="60" role="outer"/>
<member type="way" ref="65" role="outer"/>
<member type="way" ref="71" role="outer"/>
<member type="way" ref="75" role="outer"/>
<member type="way" ref="79" role="outer"/>
<member type="way" ref="84" role="outer"/>
<member type="way" ref="92" role="outer"/>
<member type="way" ref="108" role="outer"/>
<member type="way" ref="112" role="outer"/>
<member type="way" ref="132" role="outer"/>
<member type="way" ref="150" role="outer"/>
<member type="way" ref="166" role="outer"/>
<member type="way" ref="173" role="outer"/>
<member type="way" ref="178" role="outer"/>
<tag k="type" v="multipolygon"/>
<tag k="note" v="00000 ExampleCity"/>
<tag k="plz" v="00000"/>
</relation>
我想做的是使用 R 中的 XML package
对文件应用一些更改,尤其是 <relation>
部分。
1) 我想更改 v=
属性
<tag k="type" v="multipolygon"/>
到
<tag k="type" v="boundary"/>
2) 我想在所有 <relation>
父节点中添加一个新节点
<tag k='boundary' v='postal_code' />
3)更改k=
属性部分:
<tag k="note" v="00000 ExampleCity"/>
到
<tag k="city" v="00000 ExampleCity"/>
好吧,我可以使用以下方法找到所有 <relations>
:
(doc
为文件名)
getNodeSet(doc,"//relation")
或者获取全部tags
的全部<realtions>
但我不知道如何实际覆盖和添加我需要的部分。
如前所述,请考虑 XSLT,一种专用声明性编程语言,旨在操纵 XML 文档以满足最终使用需求。虽然 R 不维护全面的 XSLT 处理器,但它可以与其他 languages/software 交互,例如 Python 和 Excel。即使对于后者,R 也可以使用 RDCOMClient
库模仿 Excel 宏:
XSLT 脚本 (另存为外部 .xsl 或 .xslt 文件以供下方使用)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- CHANGE @v ATTRIBUTE -->
<xsl:template match="tag[@k='type']">
<xsl:copy>
<xsl:copy-of select="@k"/>
<xsl:attribute name="v">boundary</xsl:attribute>
</xsl:copy>
</xsl:template>
<!-- CHANGE @k ATTRIBUTE -->
<xsl:template match="tag[@k='note']">
<xsl:copy>
<xsl:attribute name="k">city</xsl:attribute>
<xsl:copy-of select="@v"/>
</xsl:copy>
</xsl:template>
<!-- ADD NODE -->
<xsl:template match="relation">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="member"/>
<xsl:apply-templates select="tag"/>
<tag k='boundary' v='postal_code' />
</xsl:copy>
</xsl:template>
</xsl:transform>
Python 脚本 (使用 lxml 模块)
import lxml.etree as ET
# LOAD ORIGINAL XML AND XSLT SCRIPT
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')
# TRANSFORM XML INTO A NEW DOM OBJECT
transform = ET.XSLT(xslt)
newdom = transform(dom)
# CONVERT TO STRING
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
R脚本(调用上面的.py脚本,假设python是系统PATH变量)
system('python "C:\Path\To\Python\Script.py"')
或者,Excel 可以 运行 XSLT with R 复制过程。
Excel宏(使用MSXML对象,此处后期绑定)
Public Sub RunXSLT()
Dim xmlDoc As Object, xslDoc As Object, newDoc As Object
Set xmlDoc = CreateObject("MSXML2.DOMDocument")
Set xslDoc = CreateObject("MSXML2.DOMDocument")
Set newDoc = CreateObject("MSXML2.DOMDocument")
xmlDoc.Load "C\Path\To\Input.xml"
xmlDoc.async = False
xslDoc.Load "C\Path\To\XSLTScript.xsl"
xslDoc.async = False
xmlDoc.transformNodeToObject xslDoc, newDoc
newDoc.Save "C\Path\To\Output.xml"
Set newDoc = Nothing
Set xslDoc = Nothing
Set xmlDoc = Nothing
End Sub
R脚本(复制上面,使用RDCOMClient)
library(RDCOMClient)
xmlfile = COMCreate("MSXML2.DOMDocument")
xslfile = COMCreate("MSXML2.DOMDocument")
newxmlfile = COMCreate("MSXML2.DOMDocument")
xmlstr = 'C\Path\To\Input.xml'
xslstr = 'C\Path\To\XSLTScript.xsl'
newxmlstr = 'C\Path\To\Output.xml'
# LOADING XML & XSLT FILES
xmlfile.async = FALSE
xmlfile$Load(xmlstr)
xslfile.async = FALSE
xslfile$Load(xslstr)
# TRANSFORMING XML FILE USING XLST INTO NEW FILE
xmlfile$transformNodeToObject(xslfile, newxmlfile)
newxmlfile$Save(newxmlstr)
最终XML输出
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="CGImap 0.0.2">
<node id="298884272" lat="54.0901447" lon="12.2516513" user="SvenHRO"
uid="46882" visible="true" version="1" changeset="676636"
timestamp="2008-09-21T21:37:45Z"/>
<way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
<nd ref="85642"/>
<nd ref="85641"/>
<nd ref="86016"/>
<nd ref="85642"/>
</way>
<relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
<member type="way" ref="2" role="outer"/>
<member type="way" ref="12" role="outer"/>
<member type="way" ref="17" role="outer"/>
<member type="way" ref="22" role="outer"/>
<member type="way" ref="27" role="outer"/>
<member type="way" ref="60" role="outer"/>
<member type="way" ref="65" role="outer"/>
<member type="way" ref="71" role="outer"/>
<member type="way" ref="75" role="outer"/>
<member type="way" ref="79" role="outer"/>
<member type="way" ref="84" role="outer"/>
<member type="way" ref="92" role="outer"/>
<member type="way" ref="108" role="outer"/>
<member type="way" ref="112" role="outer"/>
<member type="way" ref="132" role="outer"/>
<member type="way" ref="150" role="outer"/>
<member type="way" ref="166" role="outer"/>
<member type="way" ref="173" role="outer"/>
<member type="way" ref="178" role="outer"/>
<tag k="type" v="boundary"/>
<tag k="city" v="00000 ExampleCity"/>
<tag k="plz" v="00000"/>
<tag k="boundary" v="postal_code"/>
</relation>
</osm>
我有一个 XML(OSM) 文件,看起来像这样(小例子):
<way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
<nd ref="85642"/>
<nd ref="85641"/>
<nd ref="86016"/>
<nd ref="85642"/>
</way>
<relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
<member type="way" ref="2" role="outer"/>
<member type="way" ref="12" role="outer"/>
<member type="way" ref="17" role="outer"/>
<member type="way" ref="22" role="outer"/>
<member type="way" ref="27" role="outer"/>
<member type="way" ref="60" role="outer"/>
<member type="way" ref="65" role="outer"/>
<member type="way" ref="71" role="outer"/>
<member type="way" ref="75" role="outer"/>
<member type="way" ref="79" role="outer"/>
<member type="way" ref="84" role="outer"/>
<member type="way" ref="92" role="outer"/>
<member type="way" ref="108" role="outer"/>
<member type="way" ref="112" role="outer"/>
<member type="way" ref="132" role="outer"/>
<member type="way" ref="150" role="outer"/>
<member type="way" ref="166" role="outer"/>
<member type="way" ref="173" role="outer"/>
<member type="way" ref="178" role="outer"/>
<tag k="type" v="multipolygon"/>
<tag k="note" v="00000 ExampleCity"/>
<tag k="plz" v="00000"/>
</relation>
我想做的是使用 R 中的 XML package
对文件应用一些更改,尤其是 <relation>
部分。
1) 我想更改 v=
属性
<tag k="type" v="multipolygon"/>
到
<tag k="type" v="boundary"/>
2) 我想在所有 <relation>
父节点中添加一个新节点
<tag k='boundary' v='postal_code' />
3)更改k=
属性部分:
<tag k="note" v="00000 ExampleCity"/>
到
<tag k="city" v="00000 ExampleCity"/>
好吧,我可以使用以下方法找到所有 <relations>
:
(doc
为文件名)
getNodeSet(doc,"//relation")
或者获取全部tags
的全部<realtions>
但我不知道如何实际覆盖和添加我需要的部分。
如前所述,请考虑 XSLT,一种专用声明性编程语言,旨在操纵 XML 文档以满足最终使用需求。虽然 R 不维护全面的 XSLT 处理器,但它可以与其他 languages/software 交互,例如 Python 和 Excel。即使对于后者,R 也可以使用 RDCOMClient
库模仿 Excel 宏:
XSLT 脚本 (另存为外部 .xsl 或 .xslt 文件以供下方使用)
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>
<!-- IDENTITY TRANSFORM -->
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- CHANGE @v ATTRIBUTE -->
<xsl:template match="tag[@k='type']">
<xsl:copy>
<xsl:copy-of select="@k"/>
<xsl:attribute name="v">boundary</xsl:attribute>
</xsl:copy>
</xsl:template>
<!-- CHANGE @k ATTRIBUTE -->
<xsl:template match="tag[@k='note']">
<xsl:copy>
<xsl:attribute name="k">city</xsl:attribute>
<xsl:copy-of select="@v"/>
</xsl:copy>
</xsl:template>
<!-- ADD NODE -->
<xsl:template match="relation">
<xsl:copy>
<xsl:apply-templates select="@*"/>
<xsl:apply-templates select="member"/>
<xsl:apply-templates select="tag"/>
<tag k='boundary' v='postal_code' />
</xsl:copy>
</xsl:template>
</xsl:transform>
Python 脚本 (使用 lxml 模块)
import lxml.etree as ET
# LOAD ORIGINAL XML AND XSLT SCRIPT
dom = ET.parse('Input.xml')
xslt = ET.parse('XSLTScript.xsl')
# TRANSFORM XML INTO A NEW DOM OBJECT
transform = ET.XSLT(xslt)
newdom = transform(dom)
# CONVERT TO STRING
tree_out = ET.tostring(newdom, encoding='UTF-8', pretty_print=True, xml_declaration=True)
# OUTPUT TO FILE
xmlfile = open('Output.xml'),'wb')
xmlfile.write(tree_out)
xmlfile.close()
R脚本(调用上面的.py脚本,假设python是系统PATH变量)
system('python "C:\Path\To\Python\Script.py"')
或者,Excel 可以 运行 XSLT with R 复制过程。
Excel宏(使用MSXML对象,此处后期绑定)
Public Sub RunXSLT()
Dim xmlDoc As Object, xslDoc As Object, newDoc As Object
Set xmlDoc = CreateObject("MSXML2.DOMDocument")
Set xslDoc = CreateObject("MSXML2.DOMDocument")
Set newDoc = CreateObject("MSXML2.DOMDocument")
xmlDoc.Load "C\Path\To\Input.xml"
xmlDoc.async = False
xslDoc.Load "C\Path\To\XSLTScript.xsl"
xslDoc.async = False
xmlDoc.transformNodeToObject xslDoc, newDoc
newDoc.Save "C\Path\To\Output.xml"
Set newDoc = Nothing
Set xslDoc = Nothing
Set xmlDoc = Nothing
End Sub
R脚本(复制上面,使用RDCOMClient)
library(RDCOMClient)
xmlfile = COMCreate("MSXML2.DOMDocument")
xslfile = COMCreate("MSXML2.DOMDocument")
newxmlfile = COMCreate("MSXML2.DOMDocument")
xmlstr = 'C\Path\To\Input.xml'
xslstr = 'C\Path\To\XSLTScript.xsl'
newxmlstr = 'C\Path\To\Output.xml'
# LOADING XML & XSLT FILES
xmlfile.async = FALSE
xmlfile$Load(xmlstr)
xslfile.async = FALSE
xslfile$Load(xslstr)
# TRANSFORMING XML FILE USING XLST INTO NEW FILE
xmlfile$transformNodeToObject(xslfile, newxmlfile)
newxmlfile$Save(newxmlstr)
最终XML输出
<?xml version='1.0' encoding='UTF-8'?>
<osm version="0.6" generator="CGImap 0.0.2">
<node id="298884272" lat="54.0901447" lon="12.2516513" user="SvenHRO"
uid="46882" visible="true" version="1" changeset="676636"
timestamp="2008-09-21T21:37:45Z"/>
<way id="86015" version="1" timestamp="2016-02-26T15:01:32Z">
<nd ref="85642"/>
<nd ref="85641"/>
<nd ref="86016"/>
<nd ref="85642"/>
</way>
<relation id="1" version="1" timestamp="2016-02-26T15:01:32Z">
<member type="way" ref="2" role="outer"/>
<member type="way" ref="12" role="outer"/>
<member type="way" ref="17" role="outer"/>
<member type="way" ref="22" role="outer"/>
<member type="way" ref="27" role="outer"/>
<member type="way" ref="60" role="outer"/>
<member type="way" ref="65" role="outer"/>
<member type="way" ref="71" role="outer"/>
<member type="way" ref="75" role="outer"/>
<member type="way" ref="79" role="outer"/>
<member type="way" ref="84" role="outer"/>
<member type="way" ref="92" role="outer"/>
<member type="way" ref="108" role="outer"/>
<member type="way" ref="112" role="outer"/>
<member type="way" ref="132" role="outer"/>
<member type="way" ref="150" role="outer"/>
<member type="way" ref="166" role="outer"/>
<member type="way" ref="173" role="outer"/>
<member type="way" ref="178" role="outer"/>
<tag k="type" v="boundary"/>
<tag k="city" v="00000 ExampleCity"/>
<tag k="plz" v="00000"/>
<tag k="boundary" v="postal_code"/>
</relation>
</osm>