MySQL 正斜杠后带单词边界的 REGEXP

Question

我在数据库 table:

中有一些没有 http:// 的 URL

        url
row #1: 10.1.127.4/
row #2: 10.1.127.4/something

现在，以下过滤器为我提供了第 2 行 - 很好：

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4/something[[:>:]]'

但是下面的过滤器没有给我第 1 行，但不应该吗？

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4/[[:>:]]'

我应该注意，通过反斜杠转义正斜杠也不会 return 想要的行 #1:

SELECT * FROM mytable WHERE url REGEXP '[[:<:]]10.1.127.4\/[[:>:]]'

Answer 1

根据文档：http://dev.mysql.com/doc/refman/5.7/en/regexp.html

[[:<:]], [[:>:]]

These markers stand for word boundaries. They match the beginning and end of words, respectively. A word is a sequence of word characters that is not preceded by or followed by word characters. A word character is an alphanumeric character in the alnum class or an underscore (_).

/ 不是 alnum 成员，因此它不是单词边界。

Answer 2

SELECT * FROM mytable WHERE mycolumn REGEXP "[[:<:]][0-9]{1,3}\.([0-9]{1,3}.?){3}((\/)?[^ ]*)?[[:>:]]";

[[:<:]][0-9]{1,3}\.([0-9]{1,3}.?){3}((\/)?[^ ]*)?[[:>:]]

Assert position at the beginning of a word (position followed by but not preceded by an ASCII letter, digit, or underscore) «[[:<:]]»
Match a single character in the range between “0” and “9” «[0-9]{1,3}»
   Between one and 3 times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «{1,3}»
Match the character “.” literally «\.»
Match the regex below and capture its match into backreference number 1 «([0-9]{1,3}.?){3}»
   Exactly 3 times «{3}»
      You repeated the capturing group itself.  The group will capture only the last iteration.  Put a capturing group around the repeated group to capture all iterations. «{3}»
   Match a single character in the range between “0” and “9” «[0-9]{1,3}»
      Between one and 3 times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «{1,3}»
   Match any single character that is NOT a line break character (line feed) «.?»
      Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
Match the regex below and capture its match into backreference number 2 «((\/)?[^ ]*)?»
   Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
   Match the regex below and capture its match into backreference number 3 «(\/)?»
      Between zero and one times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «?»
      Match the character “/” literally «\/»
   Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^ ]*»
      Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
      The literal character “ ” « »
Assert position at the end of a word (position preceded by but not followed by an ASCII letter, digit, or underscore) «[[:>:]]»

Answer 3

发现 [[:>:]] 需要 左边的单词字符 ，反之亦然 [[:<:]]

简单的测试证实：

SELECT 'bla,,123' REGEXP '[[:<:]]bla,[[:>:]]' -- no match
SELECT 'bla,,123' REGEXP '[[:<:]]bla[[:>:]]' -- match
SELECT 'bla,,123' REGEXP '[[:<:]]bla,,123[[:>:]]' -- match

我认为这样的文档是有道理的，我误解了很多年：

[...] word boundaries. They match the beginning and end of words, [...]

因此，单词边界需要

一侧为非单词字符
另一边一个字

MySQL 正斜杠后带单词边界的 REGEXP

MySQL REGEXP with word boundary after forward slash

regex

mysql

word-boundary

trailing-slash