为什么 Nokogiri 的 xpath 没有按预期工作?

Why isn't Nokogiri's xpath working as expected?

我正在用 Nokogiri 解析 Soap 响应,但由于某些原因,xpathcss 方法无法找到 <soap:Body> 标签之外的任何标签。

我要解析的XML是

<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema">
    <soap:Body>
        <AuthenticationResponse xmlns="http://tempuri.org/">
            <AuthenticationResult>
                <SessionID>clinTQYART6qxeQ%k^Am&amp;Sd5Co3</SessionID>
                <RequestStatus>1</RequestStatus>
                <RequestMessage>Success</RequestMessage>
            </AuthenticationResult>
        </AuthenticationResponse>
    </soap:Body>
</soap:Envelope>

如果我用调试器检查已解析的 XML,我会看到

=> #(Document:0x3fce3c4dd95c {
  name = "document",
  children = [
    #(Element:0x3fce385b04dc {
      name = "Envelope",
      namespace = #(Namespace:0x3fce385b04b4 { prefix = "soap", href = "http://schemas.xmlsoap.org/soap/envelope/" }),
      children = [
        #(Element:0x3fce385e509c {
          name = "Body",
          namespace = #(Namespace:0x3fce385b04b4 { prefix = "soap", href = "http://schemas.xmlsoap.org/soap/envelope/" }),
          children = [
            #(Element:0x3fce385e4c64 {
              name = "AuthenticationResponse",
              namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }),
              children = [
                #(Element:0x3fce385e48a4 {
                  name = "AuthenticationResult",
                  namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }),
                  children = [
                    #(Element:0x3fce385e44f8 { name = "SessionID", namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }), children = [ #(Text "clinTQYART6qxeQ%k^Am&Sd5Co3")] }),
                    #(Element:0x3fce39dcff7c { name = "RequestStatus", namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }), children = [ #(Text "1")] }),
                    #(Element:0x3fce39dcfa2c { name = "RequestMessage", namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }), children = [ #(Text "Success")] })]
                  })]
              })]
          })]
      })]
  })

很好。

但是 xml.xpath("//SessionID") 给出 []

然而 xml.xpath("//soap:Body")[0] 给出

=> #(Element:0x3fce385e509c {
  name = "Body",
  namespace = #(Namespace:0x3fce385b04b4 { prefix = "soap", href = "http://schemas.xmlsoap.org/soap/envelope/" }),
  children = [
    #(Element:0x3fce385e4c64 {
      name = "AuthenticationResponse",
      namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }),
      children = [
        #(Element:0x3fce385e48a4 {
          name = "AuthenticationResult",
          namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }),
          children = [
            #(Element:0x3fce385e44f8 { name = "SessionID", namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }), children = [ #(Text "clinTQYART6qxeQ%k^Am&Sd5Co3")] }),
            #(Element:0x3fce39dcff7c { name = "RequestStatus", namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }), children = [ #(Text "1")] }),
            #(Element:0x3fce39dcfa2c { name = "RequestMessage", namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }), children = [ #(Text "Success")] })]
          })]
      })]
  })

xml.xpath("//soap:Body")[0].children[0].children[0].children[0]给出

=> #(Element:0x3fce385e44f8 { name = "SessionID", namespace = #(Namespace:0x3fce385e4c14 { href = "http://tempuri.org/" }), children = [ #(Text "clinTQYART6qxeQ%k^Am&Sd5Co3")] })

因此 xml.xpath("//soap:Body")[0].children[0].children[0].children[0].content 给了我正确的 ID 字符串。

那么为什么 xml.xpath("//SessionID") 不起作用?

这是因为 SessionID 在命名空间 http://tempuri.org/ 中。

尝试类似(未测试)的方法:

xml.xpath("//x:SessionID", {"x" => "http://tempuri.org/"})

不是您问题的直接答案,但如果您想解析 SOAP,最好使用 savon gem 而不是 nokogiri。它专为处理 SOAP 的所有复杂问题而设计。