正则表达式似乎对特殊字符无效（例如 +-.,!@#$%^&*;）

Question

我正在使用正则表达式打印字符串并在字符限制后添加新行。我不想拆分一个单词，如果它达到限制（开始在下一行打印单词），除非一组连接的字符超过限制，然后我只是在下一行继续单词的结尾。但是，当我点击特殊字符（例如 +-.,!@#$%^&*;）时，您会在下面测试我的代码时看到，出于某种原因，它会在限制中添加一个额外的字符。这是为什么？

我的函数是：

public static String limiter(String str, int lim) {
    str = str.trim().replaceAll(" +", " ");
    str = str.replaceAll("\n +", "\n");
    Matcher mtr = Pattern.compile("(.{1," + lim + "}(\W|$))|(.{0," + lim + "})").matcher(str);
    String newStr = "";
    int ctr = 0;
    while (mtr.find()) {
        if (ctr == 0) {
            newStr += (mtr.group());
            ctr++;
        } else {
            newStr += ("\n") + (mtr.group());
        }
    }
    return newStr ;
}

所以我的输入是： String str = " The 123456789 456789 +-.,!@#$%^&*();\/|<>\"\' fox jumpeded over the uf\n 2 3456 green fence ";

字符行限制为 7。

它输出：

456789 +
-.,!@#$%
^&*();\/
|<>"

当正确的输出应该是：

456789
+-.,!@#
$%^&*()
;\/|<>"

我的代码链接到在线编译器，您可以在此处运行： https://ideone.com/9gckP1

Answer 1

在您的模式中，\W 是第一个捕获组的一部分。它将这个（非单词）字符添加到 .{1,limit} 模式。

试试：“(.{1,” + lim + “})(\W|$)|(.{0,” + lim + “})”

（我目前无法使用您的正则表达式在线编译器）

Answer 2

您需要将 (\W|$) 替换为 \b，因为您的目的是匹配整个单词（而 \b 提供了此功能）。此外，由于您不需要在新创建的行上使用尾随空格，因此您还需要使用 \s*。

所以，使用

Matcher mtr = Pattern.compile("(?U)(.{1," + lim + "}\b\s*)|(.{0," + lim + "})").matcher(str);

见demo

请注意，此处使用 (?U) 来 "fix" 单词边界行为以使其与 \w 保持同步（因此变音符号不被视为单词字符）。

正则表达式似乎对特殊字符无效（例如 +-.,!@#$%^&*;）

regex seems to be off for special characters (e.g. +-.,!@#$%^&*;)

java

regex

debugging

limit