在 java 中使用正则表达式查找和替换 url

Finding and replacing an url using regex in java

我正在尝试使用 String.replace 将 url 替换为正则表达式,代码如下

public class Test {
    public static void main(String[] args) {
        String test = "https://google.com";
        //String regex = "\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
        String regex = "(http?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]"; // does not match <http://google.com>

        String newText = test.replace(regex, "");
        System.out.println(newText);
    }
}

我已经在 SO 中研究了几个关于它的问题,但它并没有取代模式。有人可以告诉我如何实现吗?

String.replace() 不接受正则表达式。使用 String.replaceAll 代替:

String newText = test.replaceAll(regex, "");

就正则表达式而言,您还应该匹配 https

String regex = "(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

您不能将正则表达式与 replace 一起使用,请改用 replaceAll,即:

   String test = "something https://google.com something";
    try {
        String newText = test.replaceAll("(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]", "");
        System.out.println(newText);
    } catch (PatternSyntaxException ex) {
        // Syntax error in the regular expression
    } catch (IllegalArgumentException ex) {
        // Syntax error in the replacement text (unescaped $ signs?)
    } catch (IndexOutOfBoundsException ex) {
        // Non-existent backreference used the replacement text
    }

输出:

something  something

现场演示:

http://ideone.com/Yi2hrb


正则表达式解释:

(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]

Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Default line breaks; Regex syntax only

Match the regex below and capture its match into backreference number 1 «(https?|ftp|file)»
   Match this alternative «https?»
      Match the character string “http” literally «http»
      Match the character “s” literally «s?»
         Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Or match this alternative «ftp»
      Match the character string “ftp” literally «ftp»
   Or match this alternative «file»
      Match the character string “file” literally «file»
Match the character string “://” literally «://»
Match a single character present in the list below «[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*»
   Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
   The literal character “-” «-»
   A character in the range between “a” and “z” «a-z»
   A character in the range between “A” and “Z” «A-Z»
   A character in the range between “0” and “9” «0-9»
   A single character from the list “+&@#/%?=~_|!:,.;” «+&@#/%?=~_|!:,.;»
Match a single character present in the list below «[-a-zA-Z0-9+&@#/%=~_|]»
   The literal character “-” «-»
   A character in the range between “a” and “z” «a-z»
   A character in the range between “A” and “Z” «A-Z»
   A character in the range between “0” and “9” «0-9»
   A single character from the list “+&@#/%=~_|” «+&@#/%=~_|»