在 java 中使用正则表达式查找和替换 url
Finding and replacing an url using regex in java
我正在尝试使用 String.replace 将 url 替换为正则表达式,代码如下
public class Test {
public static void main(String[] args) {
String test = "https://google.com";
//String regex = "\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
String regex = "(http?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]"; // does not match <http://google.com>
String newText = test.replace(regex, "");
System.out.println(newText);
}
}
我已经在 SO 中研究了几个关于它的问题,但它并没有取代模式。有人可以告诉我如何实现吗?
String.replace()
不接受正则表达式。使用 String.replaceAll
代替:
String newText = test.replaceAll(regex, "");
就正则表达式而言,您还应该匹配 https
:
String regex = "(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
您不能将正则表达式与 replace
一起使用,请改用 replaceAll
,即:
String test = "something https://google.com something";
try {
String newText = test.replaceAll("(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]", "");
System.out.println(newText);
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
} catch (IllegalArgumentException ex) {
// Syntax error in the replacement text (unescaped $ signs?)
} catch (IndexOutOfBoundsException ex) {
// Non-existent backreference used the replacement text
}
输出:
something something
现场演示:
正则表达式解释:
(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]
Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Default line breaks; Regex syntax only
Match the regex below and capture its match into backreference number 1 «(https?|ftp|file)»
Match this alternative «https?»
Match the character string “http” literally «http»
Match the character “s” literally «s?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Or match this alternative «ftp»
Match the character string “ftp” literally «ftp»
Or match this alternative «file»
Match the character string “file” literally «file»
Match the character string “://” literally «://»
Match a single character present in the list below «[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
The literal character “-” «-»
A character in the range between “a” and “z” «a-z»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “+&@#/%?=~_|!:,.;” «+&@#/%?=~_|!:,.;»
Match a single character present in the list below «[-a-zA-Z0-9+&@#/%=~_|]»
The literal character “-” «-»
A character in the range between “a” and “z” «a-z»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “+&@#/%=~_|” «+&@#/%=~_|»
我正在尝试使用 String.replace 将 url 替换为正则表达式,代码如下
public class Test {
public static void main(String[] args) {
String test = "https://google.com";
//String regex = "\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
String regex = "(http?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]"; // does not match <http://google.com>
String newText = test.replace(regex, "");
System.out.println(newText);
}
}
我已经在 SO 中研究了几个关于它的问题,但它并没有取代模式。有人可以告诉我如何实现吗?
String.replace()
不接受正则表达式。使用 String.replaceAll
代替:
String newText = test.replaceAll(regex, "");
就正则表达式而言,您还应该匹配 https
:
String regex = "(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
您不能将正则表达式与 replace
一起使用,请改用 replaceAll
,即:
String test = "something https://google.com something";
try {
String newText = test.replaceAll("(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]", "");
System.out.println(newText);
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
} catch (IllegalArgumentException ex) {
// Syntax error in the replacement text (unescaped $ signs?)
} catch (IndexOutOfBoundsException ex) {
// Non-existent backreference used the replacement text
}
输出:
something something
现场演示:
正则表达式解释:
(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]
Options: Case sensitive; Exact spacing; Dot doesn’t match line breaks; ^$ don’t match at line breaks; Default line breaks; Regex syntax only
Match the regex below and capture its match into backreference number 1 «(https?|ftp|file)»
Match this alternative «https?»
Match the character string “http” literally «http»
Match the character “s” literally «s?»
Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Or match this alternative «ftp»
Match the character string “ftp” literally «ftp»
Or match this alternative «file»
Match the character string “file” literally «file»
Match the character string “://” literally «://»
Match a single character present in the list below «[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*»
Between zero and unlimited times, as many times as possible, giving back as needed (greedy) «*»
The literal character “-” «-»
A character in the range between “a” and “z” «a-z»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “+&@#/%?=~_|!:,.;” «+&@#/%?=~_|!:,.;»
Match a single character present in the list below «[-a-zA-Z0-9+&@#/%=~_|]»
The literal character “-” «-»
A character in the range between “a” and “z” «a-z»
A character in the range between “A” and “Z” «A-Z»
A character in the range between “0” and “9” «0-9»
A single character from the list “+&@#/%=~_|” «+&@#/%=~_|»