基于BR元素将一个P元素拆分为多个P元素

Split a P element into several P elements based on BR elements

我正在尝试通过 BR 元素将包含多个 SPAN 和 BR 的单个 P 元素拆分为单独的 P 元素。

这是示例输入 xml 结构:

  <P>
     <SPAN CLASS="BYLINE">by john doe</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>
     </SPAN>
     <SPAN CLASS="EMAIL">john.doe@email.com</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>
     </SPAN>
     <SPAN CLASS="TEXT">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </SPAN>
     <SPAN CLASS="BOLD">This sentence is bold. </SPAN>
     <SPAN CLASS="TEXT">It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. </SPAN>
     <SPAN CLASS="ITALIC">This sentence is in italics. </SPAN>
     <SPAN CLASS="TEXT">It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
        <BR/>
     </SPAN>
     <SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.</SPAN>
     <SPAN CLASS="ITALIC">
        <BR/>ITALIC SUB-TITLE</SPAN>
     <SPAN CLASS="TEXT">
        <BR/>Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.<BR/>
     </SPAN>
  </P>

我希望看到的输出 xml 是:

  <P>
    <SPAN CLASS="BYLINE">by john doe</SPAN>
  </P>
  <P>
    <SPAN CLASS="EMAIL">john.doe@email.com</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT">Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. </SPAN>
     <SPAN CLASS="BOLD">This sentence is bold. </SPAN>
     <SPAN CLASS="TEXT">It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. </SPAN>
     <SPAN CLASS="ITALIC">This sentence is in italics. </SPAN>
     <SPAN CLASS="TEXT">It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</SPAN>
  </P>
  <P>
     <SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT">Contrary to popular belief, Lorem Ipsum is not simply random text. It has roots in a piece of classical Latin literature from 45 BC, making it over 2000 years old.</SPAN>
  </P>
  <P>
     <SPAN CLASS="ITALIC">ITALIC SUB-TITLE</SPAN>     
  </P>
  <P>
     <SPAN CLASS="TEXT">Richard McClintock, a Latin professor at Hampden-Sydney College in Virginia, looked up one of the more obscure Latin words, consectetur, from a Lorem Ipsum passage, and going through the cites of the word in classical literature, discovered the undoubtable source.</SPAN>
  </P>
  <P>
     <SPAN CLASS="TEXT"></SPAN>
  </P>    

这可能吗? 我尝试使用 xsl:key 和分组,但无法正常工作。

非常感谢任何建议。谢谢。

如果您使用的是 XSLT 2.0,看起来您可以将 xsl:for-each-groupgroup-ending-with

结合使用
<xsl:for-each-group select="SPAN" group-ending-with="*[BR]">

然后您将使用 current-group() 函数来获取您想要分组到 P

中的所有 SPAN 元素
<P>
    <xsl:apply-templates select="current-group()" />
</P>  

您还需要模板来停止 BR 标签,并且 SPAN 标签仅包含 BR 标签,正在输出。

试试这个 XSLT

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
    <xsl:output method="xml" indent="yes" />

    <xsl:template match="P">
      <xsl:for-each-group select="SPAN" group-ending-with="*[BR]">
            <P>
                <xsl:apply-templates select="current-group()" />
            </P>           
        </xsl:for-each-group>
    </xsl:template>

    <xsl:template match="SPAN[BR][not(normalize-space())]" />

    <xsl:template match="BR" />

    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

这并不能完全给你你需要的输出,因为 <SPAN CLASS="BOLD">BOLD SUBTITLE HERE</SPAN> 结合了下面的范围,而不是在它自己的 P 标签中,但我不知道为什么逻辑不同。

有关在 XSLT 2.0 中使用 xsl:for-each-group 的更多有趣方法,请参阅 http://www.xml.com/pub/a/2003/11/05/tr.html