Java trim 符号过多

Java trim excessive symbols

如何 trim 过多的非数字、非字母字符,如下所示:

String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####"

输出应该是:

Hey this is a string with lots of symbols!@#

我目前拥有的是这个,但它有一些奇怪的副作用,而且它太笨重了:

(第一个目标只是 trim 他们,第二个目标是使其成为 2-3 班轮)

    String precheck = message.replaceAll("[a-zA-Z]", "");

    precheck = precheck.replaceAll("[0-9]+/*\.*[0-9]*", "");
    precheck = precheck.trim();

    String[] allowed = {
            "!","\"","'","-",">","<","+","_"+"^","@","#","=","/","\"
    };

    for(char c : precheck.toString().toCharArray())
    {
        boolean contains = false;
        for(String symbol : allowed)
        {
            if(c == symbol.toCharArray()[0]){
                contains = true;
            }
        }

        if(!contains){
            message = message.replace(String.valueOf(c), "");
            message = message.trim();
        }
    }

    for(String symbol : allowed)
    {
        if (message.contains(symbol)){
            int count = 0;

            for (int i = 0; i < message.length(); i++){
                if (message.charAt(i) == symbol.toCharArray()[0]){
                    count++;
                }
            }

            if(count > 2){
                for(int i = 0;i < (count-2);i++){
                    message = message.replaceFirst(symbol, "");
                }
            }
        }
    }

    return message;

您可以只使用此正则表达式替换:

str = str.replaceAll("([^\p{L}\p{N}])\1+", "");

RegEx Demo

解释: 此正则表达式匹配任何 非数字、非字母字符 并将其捕获为匹配组 #1。然后,正则表达式使用 + 匹配同一捕获字符的 1 个或多个实例,并将它们替换为第一部分,即 </code>.</p> <p><strong>PS:</strong> 这个前瞻性正则表达式也可以工作:</p> <pre><code>str = str.replaceAll("([^\p{L}\p{N}])(?=\1+)", "");

既然你已经定义了白名单,我会推荐这种方法:匹配所有重复的允许符号字符,保留第一个。

([!"'><+_^@#=/\-])+

在Java

String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####";

test = test.replaceAll("([!"'><+_^@#=/\\-])\1+", "");

结果

"Hey this is a string with lots of symbols!@#"