Unmarshal flat XML 去数据结构

Question

我有一个扁平的 XML 结构，我正试图将其解组为 go 数据结构。我正在尝试找到一种方法来获取每个存储桶中的项目列表（项目名称）低于 XML 即- bucket1 = [apple,orange,grapes], bucket2= [apple,mangoes].

当我尝试将 xml 解组到下面的 go 数据结构中时，我能够获得存储桶名称和项目的列表，但我无法将项目列表映射到它们各自的存储桶，因为每个桶可以有很多项目。有没有办法通过更改go数据结构来实现这个xml的需求？我无法控制 XML 的结构，因此无法更改它以满足我的要求。我是新来的，我很感激这里的任何输入。

type buckets struct {
    XMLName    xml.Name `xml:"buckets"`
    BucketName []string `xml:"bucket-name"`
    ItemName   []string `xml:"item-name"`
    Weight     []string `xml:"weight"`
    Quantity   []string `xml:"quantity"`
}
        
    
    <?xml version="1.0" encoding="UTF-8"?>
    <buckets>
       <bucket-name>bucket1</bucket-name>
       <item-name>apple</item-name>
       <weight>500</weight>
       <quantity>3</quantity>
       <item-name>orange</item-name>
       <weight>500</weight>
       <quantity>2</quantity>
       <item-name>grapes</item-name>
       <weight>800</weight>
       <quantity>1</quantity>
       <bucket-name>bucket2</bucket-name>
       <item-name>apple</item-name>
       <weight>500</weight>
       <quantity>3</quantity>
       <item-name>mangoes</item-name>
       <weight>400</weight>
       <quantity>2</quantity>
    </buckets>

Answer 1

我同意 mkopriva。 Go 的注释针对 XML 用于 identically-structured 数据记录进行了优化。将它们用于混合内容就像给牛套上鞍座。插件：我已经编写了用于处理 GitHub 上的混合内容的代码，欢迎提供反馈。

Answer 2

您尝试做的事情可以通过使用自定义 xml.UnmarshalXML 并手动将存储桶映射到 Go 结构来实现。

下面描述的代码假定 XML 元素与所提供的示例具有相同的顺序。

首先我们有问题中描述的结构：

type Buckets struct {
    XMLName xml.Name `xml:"buckets"`
    Buckets []*Bucket
}

type Bucket struct {
    BucketName string `xml:"Bucket-name"`
    Items      []*Item
}

type Item struct {
    Name     string `xml:"item-name"`
    Weight   int    `xml:"weight"`
    Quantity int    `xml:"quantity"`
}

接下来我们需要通过为Buckets结构实现UnmarshalXML方法来实现Unmarshaler接口。当我们调用 xml.Unmarhsal 并将 Buckets 结构作为目标传递时，将调用此方法。

func (b *Buckets) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    b.XMLName = start.Name

    var currentBucket *Bucket
    var currentItem *Item
    for {
        t, err := d.Token()
        if t == nil {
            // append the last bucket before exiting
            b.Buckets = append(b.Buckets, currentBucket)
            break
        }
        if err != nil {
            return err
        }
        switch se := t.(type) {
        case xml.StartElement:
            switch se.Name.Local {
            case "Bucket-name":
                // check if currentBucket is nil, it is necessary for the first time that
                // is going to run. Otherwise, append the last bucket to the slice and reset it
                if currentBucket != nil {
                    b.Buckets = append(b.Buckets, currentBucket)
                }
                currentBucket = &Bucket{}

                if err := d.DecodeElement(&currentBucket.BucketName, &se); err != nil {
                    return err
                }
            case "item-name":
                currentItem = &Item{}
                if err := d.DecodeElement(&currentItem.Name, &se); err != nil {
                    return err
                }
            case "weight":
                if err := d.DecodeElement(&currentItem.Weight, &se); err != nil {
                    return err
                }
            case "quantity":
                if err := d.DecodeElement(&currentItem.Quantity, &se); err != nil {
                    return err
                }

                // since quantity comes last append the item to the bucket,  and reset it
                currentBucket.Items = append(currentBucket.Items, currentItem)
                currentItem = &Item{}
            }
        }
    }

    return nil
}

我们实际上在做的是遍历 XML 元素并使用我们的自定义逻辑将它们映射到我们的结构。我不会详细介绍 d.Token() 和 xml.StartElement，您可以随时阅读 docs 了解更多信息。

下面我们来分解一下上面的方法：

当我们遇到名称为 Bucket-name 的元素时，我们知道后面有一个新的桶，所以追加已经处理过的元素（我们必须检查 nil 因为第一次不会' t be any processed) to the slice and set currentBucket to a new Bucket (the one we are going to process).
当我们遇到名称为 item-name 的元素时，我们知道后面有一个新项目，因此将 currentItem 设置为一个新项目。
当我们遇到名称为 quantity 的元素时，我们知道这是属于 currentItem 的最后一个元素，因此将其附加到 currentBucket.Items
当 t 最终变为 nil 时，它表示输入流结束，但由于我们在遇到新桶时追加一个桶，即最后一个桶（或者如果只有一个桶）不会被追加。所以，在我们 break 之前，我们需要追加最后一个 procceded。

备注：

您可以完全避免使用 Buckets 结构，并通过使用 xml.Decoder 创建一个函数来处理解组：

func UnmarshalBuckets(rawXML []byte) []*Bucket {
    // or any io.Reader that points to the xml data
    d := xml.NewDecoder(bytes.NewReader(rawXML))
    ...
}

免责声明：

我知道上面的代码感觉有点粗略，我相信您可以改进它。随意使用它并以更具可读性的方式实现自定义逻辑。
应该有一些我没有涵盖或在提供的示例中不存在的边缘情况。你应该分析你的 XML 并尝试（如果可能的话）覆盖它们。
如前所述，代码在很大程度上依赖于 XML 元素的顺序。

工作示例位于 Go Playground

Unmarshal flat XML 去数据结构

Unmarshal flat XML to go data structure

xml

go