使用正则表达式删除其他域

Remove other domains with Regex

所以我有一个 preg_replace 将字符串中的所有 link 替换为“[link removed]” :

/((https?:\/\/)?(\w+\.)+[a-z|A-Z]{2,}(:\d+)?((\/\w+)+(\.\w+)?)?\/?)/

Simplified:
http/https, subdomain, domain, tld, port, folder/file, extension, "/"

但我需要以一种方式进行过滤,如果域是 "example.com",则不会像 :

那样替换任何内容
"http://notmydomain.com" -> "[link removed]"
"example.com" -> "example.com"

使用 negative lookahead assertion:

/((https?:\/\/)?(?![^:\/\s]*\bexample\.com)(\b\w+\.)+[a-z|A-Z]{2,}(:\d+)?((\/\w+)+(\.\w+)?)?\/?)/

解释:

(?!            # Assert that it's impossible to match this from the current location:
 [^:\/\s]*     # Any number of characters except colon, slash or whitespace
 \b            # followed by a start-of-word anchor
 example\.com  # followed by example.com.
)              # End of lookahead.

此外,我在 \w+ 部分之前添加了另一个 word boundary anchor 以确保在给定 example.com 作为输入时我们不匹配 xample.com

测试一下live on regex101.com