Java trim 符号过多
Java trim excessive symbols
如何 trim 过多的非数字、非字母字符,如下所示:
String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####"
输出应该是:
Hey this is a string with lots of symbols!@#
我目前拥有的是这个,但它有一些奇怪的副作用,而且它太笨重了:
(第一个目标只是 trim 他们,第二个目标是使其成为 2-3 班轮)
String precheck = message.replaceAll("[a-zA-Z]", "");
precheck = precheck.replaceAll("[0-9]+/*\.*[0-9]*", "");
precheck = precheck.trim();
String[] allowed = {
"!","\"","'","-",">","<","+","_"+"^","@","#","=","/","\"
};
for(char c : precheck.toString().toCharArray())
{
boolean contains = false;
for(String symbol : allowed)
{
if(c == symbol.toCharArray()[0]){
contains = true;
}
}
if(!contains){
message = message.replace(String.valueOf(c), "");
message = message.trim();
}
}
for(String symbol : allowed)
{
if (message.contains(symbol)){
int count = 0;
for (int i = 0; i < message.length(); i++){
if (message.charAt(i) == symbol.toCharArray()[0]){
count++;
}
}
if(count > 2){
for(int i = 0;i < (count-2);i++){
message = message.replaceFirst(symbol, "");
}
}
}
}
return message;
您可以只使用此正则表达式替换:
str = str.replaceAll("([^\p{L}\p{N}])\1+", "");
RegEx Demo
解释: 此正则表达式匹配任何 非数字、非字母字符 并将其捕获为匹配组 #1。然后,正则表达式使用 +
匹配同一捕获字符的 1 个或多个实例,并将它们替换为第一部分,即 </code>.</p>
<p><strong>PS:</strong> 这个前瞻性正则表达式也可以工作:</p>
<pre><code>str = str.replaceAll("([^\p{L}\p{N}])(?=\1+)", "");
既然你已经定义了白名单,我会推荐这种方法:匹配所有重复的允许符号字符,保留第一个。
([!"'><+_^@#=/\-])+
在Java
String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####";
test = test.replaceAll("([!"'><+_^@#=/\\-])\1+", "");
结果
"Hey this is a string with lots of symbols!@#"
如何 trim 过多的非数字、非字母字符,如下所示:
String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####"
输出应该是:
Hey this is a string with lots of symbols!@#
我目前拥有的是这个,但它有一些奇怪的副作用,而且它太笨重了:
(第一个目标只是 trim 他们,第二个目标是使其成为 2-3 班轮)
String precheck = message.replaceAll("[a-zA-Z]", "");
precheck = precheck.replaceAll("[0-9]+/*\.*[0-9]*", "");
precheck = precheck.trim();
String[] allowed = {
"!","\"","'","-",">","<","+","_"+"^","@","#","=","/","\"
};
for(char c : precheck.toString().toCharArray())
{
boolean contains = false;
for(String symbol : allowed)
{
if(c == symbol.toCharArray()[0]){
contains = true;
}
}
if(!contains){
message = message.replace(String.valueOf(c), "");
message = message.trim();
}
}
for(String symbol : allowed)
{
if (message.contains(symbol)){
int count = 0;
for (int i = 0; i < message.length(); i++){
if (message.charAt(i) == symbol.toCharArray()[0]){
count++;
}
}
if(count > 2){
for(int i = 0;i < (count-2);i++){
message = message.replaceFirst(symbol, "");
}
}
}
}
return message;
您可以只使用此正则表达式替换:
str = str.replaceAll("([^\p{L}\p{N}])\1+", "");
RegEx Demo
解释: 此正则表达式匹配任何 非数字、非字母字符 并将其捕获为匹配组 #1。然后,正则表达式使用 +
匹配同一捕获字符的 1 个或多个实例,并将它们替换为第一部分,即 </code>.</p>
<p><strong>PS:</strong> 这个前瞻性正则表达式也可以工作:</p>
<pre><code>str = str.replaceAll("([^\p{L}\p{N}])(?=\1+)", "");
既然你已经定义了白名单,我会推荐这种方法:匹配所有重复的允许符号字符,保留第一个。
([!"'><+_^@#=/\-])+
在Java
String test = "Hey this is a string with lots of symbols!!!!!@@@@@#####";
test = test.replaceAll("([!"'><+_^@#=/\\-])\1+", "");
结果
"Hey this is a string with lots of symbols!@#"