在文档具有可选子元素的情况下使用 Rust 和 Serde 反序列化 XML 有困难

Difficulties deserializing XML using Rust and Serde where document has optional subelements

我是 Rust 的新手,我仍在努力掌握使用它的窍门。这很酷,但是我给自己做的练习显然遗漏了一些东西。作为参考,我使用的是 rustc 1.39.0.

我想尝试编写一个简单的程序来从 MSBuild 的代码分析中读取 XML,它输出一些相当简单的 XML。我认为的问题是有一个元素(PATH)通常是空的,但有时它下面可以包含元素。更大的问题是我不喜欢 Rust(而且我通常不处理 XML),而且我不确定如何正确设置反序列化所需的结构。我正在使用 Serde 和 quick_xml。当我将 PATH 设置为字符串并使用在 PATH 下没有 SFA 元素的 XML 时,我的测试成功了。但是一旦我弄清楚 应该如何使用该标签 并相应地更新我的结构,我就会不断收到错误消息:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("missing field `FILEPATH`")', src\libcore\result.rs:1165:5

...即使测试 XML 文件中的 所有 缺陷在 PATH 下都有 SFA 元素。

我正在处理的 XML 个文件都是这样的:

<?xml version="1.0" encoding="utf-8"?>
<DEFECTS>
  <DEFECT>
    <SFA>
      <FILEPATH>c:\projects\source\repos\defecttest\defecttest</FILEPATH>
      <FILENAME>source.cpp</FILENAME>
      <LINE>8</LINE>
      <COLUMN>5</COLUMN>
    </SFA>
    <DEFECTCODE>26496</DEFECTCODE>
    <DESCRIPTION>The variable 'y' is assigned only once, mark it as const (con.4).</DESCRIPTION>
    <FUNCTION>main</FUNCTION>
    <DECORATED>main</DECORATED>
    <FUNCLINE>6</FUNCLINE>
    <PATH></PATH>
  </DEFECT>
  <DEFECT>
    <SFA>
      <FILEPATH>c:\projects\source\repos\defecttest\defecttest</FILEPATH>
      <FILENAME>source.cpp</FILENAME>
      <LINE>9</LINE>
      <COLUMN>5</COLUMN>
    </SFA>
    <DEFECTCODE>26496</DEFECTCODE>
    <DESCRIPTION>The variable 'z' is assigned only once, mark it as const (con.4).</DESCRIPTION>
    <FUNCTION>main</FUNCTION>
    <DECORATED>main</DECORATED>
    <FUNCLINE>6</FUNCLINE>
    <PATH></PATH>
  </DEFECT>
</DEFECTS>

在许多情况下,PATH 是空的,但在某些情况下它包含自己的 SFA 元素:

  <DEFECT>
    <SFA>
      <FILEPATH>c:\projects\source\repos\defecttest\defecttest</FILEPATH>
      <FILENAME>source.cpp</FILENAME>
      <LINE>9</LINE>
      <COLUMN>5</COLUMN>
    </SFA>
    <DEFECTCODE>26496</DEFECTCODE>
    <DESCRIPTION>The variable 'z' is assigned only once, mark it as const (con.4).</DESCRIPTION>
    <FUNCTION>main</FUNCTION>
    <DECORATED>main</DECORATED>
    <FUNCLINE>6</FUNCLINE>
    <PATH>
      <SFA>
        <FILEPATH>c:\projects\source\repos\defecttest\defecttest</FILEPATH>
        <FILENAME>source.cpp</FILENAME>
        <LINE>12</LINE>
        <COLUMN>3</COLUMN>
      </SFA>
    </PATH>
  </DEFECT>

在我意识到这一点之前,DEFECT 结构中的所有字段都设置为 String。假设 XML 文件中的 none 个缺陷在 PATH 下有子元素,这可以正常工作。当我将它更改为 SFA 而不是 String 时,它会给我上面提到的缺失字段错误。我正在测试的代码示例:

main.rs

extern crate quick_xml;
extern crate serde;

use std::default::Default;
use std::env;
use std::vec::Vec;

use quick_xml::de::from_str;
use serde::{Serialize, Deserialize};

/*
 * Structs for the defect XML
 */

#[derive(Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct DEFECTS {
    #[serde(rename = "DEFECT", default)]
    pub defects: Vec<DEFECT>,
}

#[derive(Default, Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct DEFECT {
    #[serde(default)]
    pub SFA: SFA,
    pub DEFECTCODE: String,
    pub DESCRIPTION: String,
    pub FUNCTION: String,
    pub DECORATED: String,
    pub FUNCLINE: String,
    #[serde(default)]
    pub PATH: Vec<SFA>,
}

#[derive(Default, Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct SFA {
    pub FILEPATH: String,
    pub FILENAME: String,
    pub LINE: String,
    pub COLUMN: String,
}

/*
 * Main app code
 */

fn main() {
    // Expect the path to the XML file to be passed as the first and only argument
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        panic!("Invalid argument count. Specify a single file to process.");
    }

    let processing_file = &args[1];
    println!("Will attempt to process file: '{}'", &processing_file);

    // Try to load the contents of the file
    let file_content : String = match std::fs::read_to_string(&processing_file) {
        Ok(file_content) => file_content,
        Err(e) => {
            panic!("Failed to read file: '{}' -- {}", &processing_file, e);
        }
    };

    // Now, try to deserialize the XML we have in file_content
    let defect_list : DEFECTS = from_str(&file_content).unwrap();

    // Assuming the unwrap above didn't blow up, we should get a count here
    println!("Retrieved {} defects from file '{}'", defect_list.defects.len(), &processing_file);
}

Cargo.toml

[package]
name = "rust_xml_test"
version = "0.1.0"
authors = ["fny82"]
edition = "2018"

[dependencies]
quick-xml = { version = "0.17", features = [ "serialize" ] }
serde = { version = "1.0", features = [ "derive" ] }

示例输出

C:\Development\RustXmlTest>cargo run -- "c:\development\rustxmltest\test3.xml"
   Compiling rust_xml_test v0.1.0 (C:\Development\RustXmlTest)
    Finished dev [unoptimized + debuginfo] target(s) in 1.56s
     Running `target\debug\rust_xml_test.exe c:\development\rustxmltest\test3.xml`
Will attempt to process file: 'c:\development\rustxmltest\test3.xml'
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom("missing field `FILEPATH`")', src\libcore\result.rs:1165:5
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace.
error: process didn't exit successfully: `target\debug\rust_xml_test.exe c:\development\rustxmltest\test3.xml` (exit code: 101)

我确定我在做一些愚蠢的事情,部分原因可能是我在挑战的范围和我目前对使用 Rust 的理解水平方面超越了自己。谁能帮助我解决我遗漏和做错的事情?

有点相关:从那以后我了解到我可以使用 rename 属性 以使我的结构符合 Rust 的命名约定,但现在我不想开始弄乱它,直到我让底层功能正常工作。

----编辑----

供参考,@edwardw 更正了现在可用的代码:

extern crate quick_xml;
extern crate serde;

use std::default::Default;
use std::env;
use std::vec::Vec;

use quick_xml::de::from_str;
use serde::{Serialize, Deserialize};

/*
 * Structs for the defect XML
 */

#[derive(Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct DEFECTS {
    #[serde(rename = "DEFECT", default)]
    pub defects: Vec<DEFECT>,
}

#[derive(Default, Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct DEFECT {
    #[serde(default)]
    pub SFA: SFA,
    pub DEFECTCODE: String,
    pub DESCRIPTION: String,
    pub FUNCTION: String,
    pub DECORATED: String,
    pub FUNCLINE: String,
    pub PATH: PATH,
}

#[derive(Default, Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct SFA {
    pub FILEPATH: String,
    pub FILENAME: String,
    pub LINE: String,
    pub COLUMN: String,
}

#[derive(Default, Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct PATH {
    pub SFA: Option<SFA>,
}

/*
 * Main app code
 */

fn main() {
    // Expect the path to the XML file to be passed as the first and only argument
    let args: Vec<String> = env::args().collect();
    if args.len() != 2 {
        panic!("Invalid argument count. Specify a single file to process.");
    }

    let processing_file = &args[1];
    println!("Will attempt to process file: '{}'", &processing_file);

    // Try to load the contents of the file
    let file_content : String = match std::fs::read_to_string(&processing_file) {
        Ok(file_content) => file_content,
        Err(e) => {
            panic!("Failed to read file: '{}' -- {}", &processing_file, e);
        }
    };

    // Now, try to deserialize the XML we have in file_content
    let defect_list : DEFECTS = from_str(&file_content).unwrap();

    // Assuming the unwrap above didn't blow up, we should get a count here
    println!("Retrieved {} defects from file '{}'", defect_list.defects.len(), &processing_file);
}

示例:

C:\Development\RustXmlTest>cargo run -- "c:\development\rustxmltest\test1.xml"
   Compiling rust_xml_test v0.1.0 (C:\Development\RustXmlTest)
    Finished dev [unoptimized + debuginfo] target(s) in 1.66s
     Running `target\debug\rust_xml_test.exe c:\development\rustxmltest\test1.xml`
Will attempt to process file: 'c:\development\rustxmltest\test1.xml'
Retrieved 2 defects from file 'c:\development\rustxmltest\test1.xml'

其中 test1.xml 包含:

<?xml version="1.0" encoding="utf-8"?>
<DEFECTS>
  <DEFECT>
    <SFA>
      <FILEPATH>c:\projects\source\repos\defecttest\defecttest</FILEPATH>
      <FILENAME>source.cpp</FILENAME>
      <LINE>8</LINE>
      <COLUMN>5</COLUMN>
    </SFA>
    <DEFECTCODE>26496</DEFECTCODE>
    <DESCRIPTION>The variable 'y' is assigned only once, mark it as const (con.4).</DESCRIPTION>
    <FUNCTION>main</FUNCTION>
    <DECORATED>main</DECORATED>
    <FUNCLINE>6</FUNCLINE>
    <PATH></PATH>
  </DEFECT>
  <DEFECT>
    <SFA>
      <FILEPATH>c:\projects\source\repos\defecttest\defecttest</FILEPATH>
      <FILENAME>source.cpp</FILENAME>
      <LINE>9</LINE>
      <COLUMN>5</COLUMN>
    </SFA>
    <DEFECTCODE>26496</DEFECTCODE>
    <DESCRIPTION>The variable 'z' is assigned only once, mark it as const (con.4).</DESCRIPTION>
    <FUNCTION>main</FUNCTION>
    <DECORATED>main</DECORATED>
    <FUNCLINE>6</FUNCLINE>
    <PATH>
      <SFA>
        <FILEPATH>c:\projects\source\repos\defecttest\defecttest</FILEPATH>
        <FILENAME>source.cpp</FILENAME>
        <LINE>12</LINE>
        <COLUMN>3</COLUMN>
      </SFA>
    </PATH>
  </DEFECT>
</DEFECTS>

PATH 本身应该建模为具有一个可选字段的结构。这有效:

#[derive(Default, Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct DEFECT {
    #[serde(default)]
    pub SFA: SFA,
    pub DEFECTCODE: String,
    pub DESCRIPTION: String,
    pub FUNCTION: String,
    pub DECORATED: String,
    pub FUNCLINE: String,
    pub PATH: PATH,
}

#[derive(Default, Serialize, Deserialize, Debug)]
#[allow(non_snake_case)]
pub struct PATH {
    SFA: Option<SFA>,
}