使用不同值和 xslt 2.0 删除重复元素

Removing duplicate elements using distinct-values and xslt 2.0

我正在尝试解决一个问题,即我想从一系列元素中删除重复值。

我已经尝试了一段时间,下面的代码看起来像我认为可行的东西,但我收到了一个错误:

XPTY0020:前导“/”不能select包含上下文项的树的根节点:上下文项不是节点

XSLT:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0">
    <xsl:strip-space elements="*"/>
    <xsl:output method="xml" indent="yes"/>

    <xsl:template match="/">

        <xsl:for-each select="distinct-values(/tobject/tobject.subject/@tobject.subject.refnum)">
            <xsl:copy-of select="."/>
        </xsl:for-each>

    </xsl:template>
</xsl:stylesheet>

XML:

<?xml version="1.0" encoding="UTF-8"?>
<tobject tobject.type="Utenriks">
    <tobject.property tobject.property.type="Nyheter"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04005000" tobject.subject.matter="olje og energi"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11000000" tobject.subject.type="politikk"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11000000" tobject.subject.type="politikk"/>
    <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11003000" tobject.subject.matter="valg"/>
    <tobject.subject tobject.subject.code="KRE" tobject.subject.refnum="02000000" tobject.subject.type="kriminalitet og rettsvesen"/>
    <tobject.subject tobject.subject.code="FRI" tobject.subject.refnum="10000000" tobject.subject.type="fritid"/>
</tobject>

想要的结果:

<?xml version="1.0" encoding="UTF-8"?>
<tobject tobject.type="Utenriks">
    <tobject.property tobject.property.type="Nyheter"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04005000" tobject.subject.matter="olje og energi"/>
    <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11000000" tobject.subject.type="politikk"/>
    <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11003000" tobject.subject.matter="valg"/>
    <tobject.subject tobject.subject.code="KRE" tobject.subject.refnum="02000000" tobject.subject.type="kriminalitet og rettsvesen"/>
    <tobject.subject tobject.subject.code="FRI" tobject.subject.refnum="10000000" tobject.subject.type="fritid"/>
</tobject>

the code below sort of looks like something I thought would work, but I am getting an error:

XPTY0020: Leading '/' cannot select the root node of the tree containing the context item: the context item is not a node

此错误无法重现运行您的代码 - 请参阅:http://xsltransform.net/gWvjQfa

但是,distinct-values() 的结果是 个值 的序列,而不是 个节点 。您期望的结果 - 删除重复的 元素 - 使用分组更容易实现:

XSLT 2.0

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="/tobject">
    <xsl:copy>
        <xsl:copy-of select="@* | tobject.property"/>
        <xsl:for-each-group select="tobject.subject" group-by="@tobject.subject.refnum">
            <xsl:copy-of select="current-group()[1]"/>
        </xsl:for-each-group>
    </xsl:copy>
</xsl:template>

</xsl:stylesheet>

我。一个更短的解决方案 它是纯 XSLT 1.0 并且不需要不必要的元素名称。

此外,它的效率不亚于使用 <xsl:for-each-group> 的 XSLT 2.0 解决方案 -- 因为 这里我们使用 Muenchian 方法进行分组:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>
 <xsl:key name="kOS" match="tobject.subject" use="@tobject.subject.refnum"/>

  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match=
  "tobject.subject[generate-id() != generate-id(key('kOS', @tobject.subject.refnum)[1])]"/>
</xsl:stylesheet>

当此转换应用于提供的 XML 文档时:

<tobject tobject.type="Utenriks">
    <tobject.property tobject.property.type="Nyheter"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04005000" tobject.subject.matter="olje og energi"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11000000" tobject.subject.type="politikk"/>
    <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
    <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11000000" tobject.subject.type="politikk"/>
    <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11003000" tobject.subject.matter="valg"/>
    <tobject.subject tobject.subject.code="KRE" tobject.subject.refnum="02000000" tobject.subject.type="kriminalitet og rettsvesen"/>
    <tobject.subject tobject.subject.code="FRI" tobject.subject.refnum="10000000" tobject.subject.type="fritid"/>
</tobject>

产生了想要的、正确的结果:

<tobject tobject.type="Utenriks">
   <tobject.property tobject.property.type="Nyheter"/>
   <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000" tobject.subject.type="økonomi og næringsliv"/>
   <tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04005000" tobject.subject.matter="olje og energi"/>
   <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11000000" tobject.subject.type="politikk"/>
   <tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11003000" tobject.subject.matter="valg"/>
   <tobject.subject tobject.subject.code="KRE" tobject.subject.refnum="02000000" tobject.subject.type="kriminalitet og rettsvesen"/>
   <tobject.subject tobject.subject.code="FRI" tobject.subject.refnum="10000000" tobject.subject.type="fritid"/>
</tobject>

二.一个单行 XPath 2.0 表达式,它选择想要的唯一(每个组元素中的一个)

$vElems[index-of($vElems/@tobject.subject.refnum, @tobject.subject.refnum)[1]]

此处 $vElems 必须定义为:

/*/tobject.subject

在提供的 XML 文档上计算此 XPath 2.0 表达式时,将选择所需的元素序列:

<tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04000000"
             tobject.subject.type="økonomi og næringsliv"/>
<tobject.subject tobject.subject.code="OKO" tobject.subject.refnum="04005000"
             tobject.subject.matter="olje og energi"/>
<tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11000000"
             tobject.subject.type="politikk"/>
<tobject.subject tobject.subject.code="POL" tobject.subject.refnum="11003000"
             tobject.subject.matter="valg"/>
<tobject.subject tobject.subject.code="KRE" tobject.subject.refnum="02000000"
             tobject.subject.type="kriminalitet og rettsvesen"/>
<tobject.subject tobject.subject.code="FRI" tobject.subject.refnum="10000000"
             tobject.subject.type="fritid"/>