在 java 中使用正则表达式将模板标签替换为速度

Replace template tags with velocity using regex in java

我需要用 Velocity 代码替换 HTML 文档中的一些自定义模板标签,以便输出可以在另一个应用程序中使用。

我收到的代码如下所示:

<BG SWITCH>param
    <BG CASE>value1:
        text 1
    <BG /CASE>
    <BG CASE>value2:
        text 2
    <BG /CASE>
    <BG CASE>value3:
        text 3
    <BG /CASE>
    <BG CASE>value4:
        text 4
    <BG /CASE>
<BG /SWITCH>

结果应该是这样的:

#set ($param = $!user.data.param)
#if($!param == "value1")
    text 1
#end
#if($!param == "value2")
    text 2
#end
#if($!param == "value3")
    text 3
#end
#if($!param == "value4")
    text 4
#end

我现在的 Java 代码是这样的:

Pattern pattern = Pattern.compile("<BG SWITCH>(.+?)<BG CASE>(.+?):(.+?)<BG /CASE>", Pattern.DOTALL);
Matcher matcher = pattern.matcher(newHtml);
newHtml = matcher.replaceAll("#set \(\$\ = \$!user.data.) #if(\$ == \"\") #end");

我得到的结果是这样的:

#set ($param = $!user.data.param)
    #if($param == "value1")
        text 1
    #end
    <BG CASE>value2:
        text 2
    <BG /CASE>
    <BG CASE>value3:
        text 3
    <BG /CASE>
    <BG CASE>value4:
        text 4
    <BG /CASE>
<BG /SWITCH>

所以 "switch" 和第一个 "case" 被正确替换,但我不知道如何替换以下情况。

有什么想法吗?

在 svasa 回答后编辑:

建议的解决方案非常适合我提供的示例代码,但我应该提到那只是要处理的实际文件中的一个片段。实际文件可能包含多个这样的 switch-case 块。在那种情况下,建议的代码并没有完全涵盖它。

当 "source" 看起来像这样时:

<div>
    Some other content before
</div>

<div>
    <BG SWITCH>param1
        <BG CASE>value1:
            text 1
        <BG /CASE>
        <BG CASE>value2:
            text 2
        <BG /CASE>
        <BG CASE>value3:
            text 3
        <BG /CASE>
        <BG CASE>value4:
            text 4
        <BG /CASE>
    <BG /SWITCH>
</div>

<div>
    Some other content between
</div>

<div>
    <BG SWITCH>param2
        <BG CASE>value1:
            text 1
        <BG /CASE>
        <BG CASE>value2:
            text 2
        <BG /CASE>
        <BG CASE>value3:
            text 3
        <BG /CASE>
        <BG CASE>value4:
            text 4
        <BG /CASE>
    <BG /SWITCH>
</div>

<div>
    Some other content after
</div>

然后我得到以下结果:

#set ($param1 = $!user.data.param1)
#if($!param1== "value1")
    text 1
#end
#if($!param1== "value2")
    text 2
#end
#if($!param1== "value3")
    text 3
#end
#if($!param1== "value4")
    text 4
#end
#if($!param1== "value1")
    text 1
#end
#if($!param1== "value2")
    text 2
#end
#if($!param1== "value3")
    text 3
#end
#if($!param1== "value4")
    text 4
#end

所以 switch-case 块之前、之间和之后的任何其他内容都将丢失,第一个 "param" 用于第二个 switch-case 块。我将如何解决这个问题?

Before proceeding to look into the solution I wrote below - Your input string looks more like an xml but it is not, because the strings like BG /SWITCH is not valid xml. Do you know this is correct input for you ? All other elements and the structure of the string looks like xml. Confirm with the source that it is an xml or not. If it is an xml, it is a breeze to extract whatever you want with Java's inbuilt xpath.Otherwise it is excruciatingly painful to extract desired strings with regex as I did below for you.

步骤:

  1. 首先使用正则表达式得到想要的 <div> :

    (<BG SWITCH>(.*)\n\s*(\s*<BG CASE>(.*):\n\s*(.*)\n\s*<BG \/CASE>)+\n\s*<BG \/SWITCH>)

此正则表达式的演示是 here

  1. 将找到的字符串传递给我在之前的回答中使用的正则表达式:

我会先获取顶行,即 param 然后是剩余的字符串 value1:text 1

对于 param,正则表达式为:<BG SWITCH>(.*)

对于 valuetext 字符串,您可以使用正则表达式:

<BG CASE>(.*)\n?(\s+(.*))\n?\s+<BG /CASE> 这个正则表达式的演示是 here

代码如下所示:

String mainRegex = "(<BG SWITCH>(.*)\n\s*(\s*<BG CASE>(.*):\n\s*(.*)\n\s*<BG \/CASE>)+\n\s*<BG \/SWITCH>)";
String result = "";
Pattern p = Pattern.compile( mainRegex );
Matcher m = p.matcher( sb.toString() );
while ( m.find() )
{
    String toSearch = m.group();
    result += updateResult( result, toSearch );
}

System.out.println(result);


private static String updateResult( String replaced, String searchString )
{
     Pattern firstPattern = Pattern.compile("<BG SWITCH>(.*)");
     Matcher matcher = firstPattern.matcher(searchString);
     if ( matcher.find() )
     {
         String paramtext = matcher.group(1);

          replaced = "#set ($" + paramtext + " = $!user.data." + paramtext + ")";
          replaced = replaced + "\n";

          Pattern pattern = Pattern.compile("<BG CASE>(.*):\n?(\s+(.*))\n?\s+<BG /CASE>");
          matcher = pattern.matcher(searchString);
          while ( matcher.find() )
          {
             String valueString = matcher.group(1);
             String textString = matcher.group(2);
             replaced = replaced + "#if($!" + paramtext  + "== \"" + valueString + "\")" + "\n" + textString + "\n#end" + "\n";
          }



     }

     return replaced;
}

最后一个 println 语句的输出:

    #set ($param1 = $!user.data.param1)
#if($!param1== "value1")
            text 1
#end
#if($!param1== "value2")
            text 2
#end
#if($!param1== "value3")
            text 3
#end
#if($!param1== "value4")
            text 4
#end
#set ($param2 = $!user.data.param2)
#if($!param2== "value1")
            text 1
#end
#if($!param2== "value2")
            text 2
#end
#if($!param2== "value3")
            text 3
#end
#if($!param2== "value4")
            text 4
#end