Check/Resolve 个交叉引用在单独的 xml 个文件中

Question

起点

假设我们有一本 xml 格式的书。本书包含许多资产，这些资产可以通过带有属性 path 的标签 ref-asset 相互引用。 [路径掩码：目标资产的 {id}|{version}]。

重要提示：资产是单个文件，没有合并的完整文件。

示例XML（为更好的视觉效果合并）

<book>
    <!-- file a.xml -->
    <asset id="1" version="1.0">
        <name>Prolog</name>
    </asset>
    <!-- file b.xml -->
    <asset id="2" version="2">
        <name>Table of content</name>
        <list>
            <item><ref-asset path="1|1.0">Prolog</ref-asset></item>
            <item><ref-asset path="2|2.0">Table of content</ref-asset></item>
            <item><ref-asset path="3|1.1">FooBar</ref-asset></item>
        </list>
    </asset>
    <!-- file c.xml -->
    <asset id="3" version="1.1">
        <name>FooBar</name>
    </asset>
</book>

请求

如果链接目标在 book 中，则检查所有 ref-asset。
创建关于结果的报告[存在，不存在，资产存在但版本错误，...]
[另外：将引用替换为目标内容。]

设置

Saxon 9.6.xEE XSLT 2.0
Java
100 最多 x 千个单个文档（组合文件大小：上 3 位 Mb）

如何解决

第一次尝试函数collection() + function document():

通过 collection() 在文件系统上搜索所有单个资产文件，通过 document() 将它们加载到进程中并搜索匹配的命中。

第二次尝试合并，完成文件：

将所有单个 assets 合并到 book 并通过 xsl:key 或类似技术进行匹配。

问题

collection() 是否能够加载数以千计的文档并且在随后的 document() 处理资产时仍然表现良好？
如何"index"运行定时加载文档[还是通过xsl:key？]高效搜索？

非常感谢进一步的提示/不需要特定的样式表[我会自己做，只要我知道该怎么做]。

编辑：collection() returns 已经是一系列文档节点，因此 document() 可能是不必要的。

Answer 1

关于性能的问题总是与产品相关，因此如果问题是 Saxon 特有的，会更容易回答。

我经常使用 Saxon 中的 collection() 函数来处理数千个输入文档，是的，它非常有能力做到这一点。在 Saxon-EE 中，collection() 是多线程的，因此您可以在多核机器上并行解析多个文档。

索引有点棘手，因为 key() 函数只能搜索一个文档。几周前，我们在牛津 XML 暑期学校的性能研讨会上研究了一个非常相似的问题，并通过使用地图的新 XSLT 3.0 功能解决了这个问题（速度提高了十倍）。像这样：

<xsl:variable name="index" as="map(xs:string, element(asset))">
  <xsl:map>
    <xsl:for-each select="collection('....')/asset">
      <xsl:map-entry key="@id || '|' || @version"
                     select="."/>
    </xsl:for-each>
  </xsl:map>
</xsl:variable>

<xsl:template match="ref-asset">
  <xsl:variable name="asset" select="$index(@path)"/>
  ....
</xsl:template>

Answer 2

我已经编写了一个 npm 包来解析 xml 中的引用。希望它能达到你的目的 https://www.npmjs.com/package/xml-path-resolver。该包将采用 xml 和 return JSON 以及已解析的路径

代码使用

const xmlPathResolver = require("xml-path-resolver");
const xmlString = `
<?xml version="1.0" encoding="utf-8"?>  
<note id="1212"  importance="high" logged="true" x_note="23">
    <title>Happy</title>
     <todo>Work</todo>
     <todo>Play</todo>
</note>
<note id="23" importance="high" logged="true">
</note>
<note importance="high" logged="true">
</note>
<person x_note="1212">
</person>
`;
const resolvedJSON = xmlPathResolver(xmlString,{ crossReference: /x_(.*)/ });

示例：

<?xml version="1.0" encoding="utf-8"?>  
<note id="1212"  importance="high" logged="true" x_note="23">
    <title>Happy</title>
     <todo>Work</todo>
     <todo>Play</todo>
</note>
<note id="23" importance="high" logged="true">
</note>
<note importance="high" logged="true">
</note>
<person x_note="1212">
</person>

以上xml有交叉引用路径，解析后的JSON输出为

{
  "_declaration": {
    "_attributes": {
      "version": "1.0",
      "encoding": "utf-8"
    }
  },
  "note": [
    {
      "_attributes": {
        "id": "1212",
        "importance": "high",
        "logged": "true",
        "x_note": {
          "_attributes": {
            "id": "23",
            "importance": "high",
            "logged": "true"
          }
        }
      },
      "title": {
        "_text": "Happy"
      },
      "todo": [
        {
          "_text": "Work"
        },
        {
          "_text": "Play"
        }
      ]
    },
    {
      "_attributes": {
        "id": "23",
        "importance": "high",
        "logged": "true"
      }
    },
    {
      "_attributes": {
        "importance": "high",
        "logged": "true"
      }
    }
  ],
  "person": {
    "_attributes": {
      "x_note": {
        "_attributes": {
          "id": "1212",
          "importance": "high",
          "logged": "true",
          "x_note": {
            "_attributes": {
              "id": "23",
              "importance": "high",
              "logged": "true"
            }
          }
        },
        "title": {
          "_text": "Happy"
        },
        "todo": [
          {
            "_text": "Work"
          },
          {
            "_text": "Play"
          }
        ]
      }
    }
  }
}

Check/Resolve 个交叉引用在单独的 xml 个文件中

Check/Resolve cross-references in separate xml files

xml

xslt

xslt-2.0

起点

请求

设置

如何解决

问题

代码使用