通过 UnMarshal 和 MarshalIndent 的往返 xml

Round trip xml through UnMarshal and MarshalIndent

我想快速创建一个实用程序来使用 golang 的 xml.MarshalIndent()

格式化任何 XML 数据

但是this code

package main

import (
    "encoding/xml"
    "fmt"
)

func main() {

    type node struct {
        XMLName  xml.Name
        Attrs    []xml.Attr `xml:",attr"`
        Text     string     `xml:",chardata"`
        Children []node     `xml:",any"`
    }

    x := node{}
    _ = xml.Unmarshal([]byte(doc), &x)
    buf, _ := xml.MarshalIndent(x, "", "  ") // prefix, indent

    fmt.Println(string(buf))
}

const doc string = `<book lang="en">
     <title>The old man and the sea</title>
       <author>Hemingway</author>
</book>`

生产

<book>&#xA;     &#xA;       &#xA;
  <title>The old man and the sea</title>
  <author>Hemingway</author>
</book>

注意 <book> 打开元素后的无关内容。

首先,您没有正确使用属性结构标签,所以这是一个简单的解决方法。

来自https://godoc.org/encoding/xml#Unmarshal

  • If the XML element has an attribute not handled by the previous rule and the struct has a field with an associated tag containing ",any,attr", Unmarshal records the attribute value in the first such field.

其次,因为标签 xml:",chardata" 甚至没有通过 xml.Unmarshaller 接口的 UnmarshalXML 传递那个字段,你不能简单地为 [=14 创建一个新类型=] 并按照相同文档中的说明为其实现该接口。 (注意除[]byte或string以外的任何类型都会强制报错)

  • If the XML element contains character data, that data is accumulated in the first struct field that has tag ",chardata". The struct field may have type []byte or string. If there is no such field, the character data is discarded.

因此,处理不需要的字符的最简单方法是事后替换它们。

这里有完整的代码示例:https://play.golang.org/p/VSDskgfcLng

var Replacer = strings.NewReplacer("&#xA;","","&#x9;","","\n","","\t","")

func recursiveReplace(n *Node) {
    n.Text = Replacer.Replace(n.Text)
    for i := range n.Children {
        recursiveReplace(&n.Children[i])
    }
}

理论上可以为 Node 实现 xml.Unmarshaller 接口,但是你不仅要处理手动 xml 解析,还要处理它是递归结构的事实.事后删除不需要的字符是最简单的。