基于BR元素将一个P元素拆分为多个P元素

Question

我正在尝试通过 BR 元素将包含多个 SPAN 和 BR 的单个 P 元素拆分为单独的 P 元素。

这是示例输入 xml 结构：

  <P>
     <SPAN CLASS="BYLINE">by john doe</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>
     </SPAN>
     <SPAN CLASS="EMAIL">john.doe@email.com</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>
     </SPAN>
     <SPAN CLASS="TEXT">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </SPAN>
     <SPAN CLASS="BOLD">This sentence is bold. </SPAN>
     <SPAN CLASS="TEXT">It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. </SPAN>
     <SPAN CLASS="ITALIC">This sentence is in italics. </SPAN>
     <SPAN CLASS="TEXT">It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
        <BR/>
     </SPAN>
     <SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.</SPAN>
     <SPAN CLASS="ITALIC">
        <BR/>ITALIC SUB-TITLE</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.<BR/>
     </SPAN>
  </P>

我希望看到的输出 xml 是：

  <P>
    <SPAN CLASS="BYLINE">by john doe</SPAN>
  </P>
  <P>
    <SPAN CLASS="EMAIL">john.doe@email.com</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </SPAN>
     <SPAN CLASS="BOLD">This sentence is bold. </SPAN>
     <SPAN CLASS="TEXT">It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. </SPAN>
     <SPAN CLASS="ITALIC">This sentence is in italics. </SPAN>
     <SPAN CLASS="TEXT">It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</SPAN>
  </P>
  <P>
     <SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT">Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.</SPAN>
  </P>
  <P>
     <SPAN CLASS="ITALIC">ITALIC SUB-TITLE</SPAN>     
  </P>
  <P>
     <SPAN CLASS="TEXT">Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT"></SPAN>
  </P>

这可能吗？我尝试使用 xsl:key 和分组，但无法正常工作。

非常感谢任何建议。谢谢。

Answer 1

如果您使用的是 XSLT 2.0，看起来您可以将 xsl:for-each-group 与 group-ending-with

结合使用

<xsl:for-each-group select="SPAN" group-ending-with="*[BR]">

然后您将使用 current-group() 函数来获取您想要分组到 P

中的所有 SPAN 元素

<P>
    <xsl:apply-templates select="current-group()" />
</P>

您还需要模板来停止 BR 标签，并且 SPAN 标签仅包含 BR 标签，正在输出。

试试这个 XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="xml" indent="yes" />

    <xsl:template match="P">
      <xsl:for-each-group select="SPAN" group-ending-with="*[BR]">
            <P>
                <xsl:apply-templates select="current-group()" />
            </P>           
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="SPAN[BR][not(normalize-space())]" />

    <xsl:template match="BR" />

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

这并不能完全给你你需要的输出，因为 <SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN> 结合了下面的范围，而不是在它自己的 P 标签中，但我不知道为什么逻辑不同。

有关在 XSLT 2.0 中使用 xsl:for-each-group 的更多有趣方法，请参阅 http://www.xml.com/pub/a/2003/11/05/tr.html。

基于BR元素将一个P元素拆分为多个P元素

Split a P element into several P elements based on BR elements

xml

xslt

xslt-2.0