通过在 xslt 代码中使用正则表达式在文本中捕获 url

Question

这是我的测试输入：

<license>
     <p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p>
</license>

期望的输出：

<license xlink:href="http://creativecommons.org/licenses/by/4.0/">
     <p>some text (http://creativecommons.org/licenses/by/3.0/) some text.</p> 
</license>

基本上，我试图在 license 元素不包含属性 xlink:href="http:// ******"> 的文本中复制 url 查看子项 <license-p> 并将任何 URL 移动到父项（许可证）

的 xlink:href 属性

这是我的 xslt:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xlink="http://www.w3.org/1999/xlink"

exclude-result-prefixes="xs"
version="3.0"> 
    <xsl:output method="html" encoding="UTF-8" indent="yes" />
    <xsl:strip-space elements="*"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="license">
          <xsl:copy>
            <xsl:attribute name="xlink:href">                    
                <xsl:value-of select='replace(p,"[\s\S]*" ,"(\b(?:(?:https?|ftp):\/\/|www\.|ftp\.)(?:\([-A-Z0-9+&amp;@#\/%=~_|$?!:,.]*\)|[-A-Z0-9+&amp;@#\/%=~_|$?!:,.])*(?:\([-A-Z0-9+&amp;@#\/%=~_|$?!:,.]*\)|[A-Z0-9+&amp;@#\/%=~_|$]))")'/>
            </xsl:attribute> 
            <xsl:apply-templates/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="p/@xlink:href"/>   
</xsl:stylesheet>

我使用的正则表达式不适用于撒克逊欠字符，例如？

Answer 1

好的，我知道正则表达式远非完美，但以下对我有用：

<xsl:analyze-string 
    select="$elValue"
    regex="((https?|ftp|gopher|telnet|file):(()|(\\))+[\w\d:#@%/;$()~_?\+-=\\\.&amp;]*\w*.\w*\W\w*\W\w*\W\d.\d\W)">                    
        <xsl:matching-substring>
            <xsl:value-of select="regex-group(1)"/>                       
        </xsl:matching-substring>
</xsl:analyze-string>

通过在 xslt 代码中使用正则表达式在文本中捕获 url

Capturing url within text by using regex in xslt code

regex

url

xslt-2.0