如何使用正则表达式在一两个匹配组之后捕获句子的其余部分?
How to capture the rest of a sentence after one or two matching groups with regex?
所以我有两个句子正在处理,我有兴趣根据单词中的字符制作特定的捕获组。所以我有这两个西班牙语句子:
- Yo quiero irme de viaje。
- Yo puedo caminar en la nieve.
第一个捕获组必须是其中一个动词,即。 “quiero”和“puedo”所以我用这个正则表达式 ([PpDdQq].*o)
.
第二个捕获组必须是紧跟在动词之后的单词,以“me”结尾,我用 (\w*me)
来做到这一点。
现在对于最后一个捕获组,在没有以“-me”结尾的直接单词的情况下,它必须是紧跟在第一个捕获组之后的所有单词和空格,或者在存在的情况下,必须是紧跟在第二个捕获组之后的所有单词和空格以“-me”结尾的直接词,我使用了 (\w.+)
但它没有用。
谁能帮我弄清楚为什么?谢谢。下面是完整的正则表达式和 link 到包含要匹配的表达式和示例的正则表达式网站:
([PpDdQq].*o) |(\w*me)|(\w.+)
使用
\b([PpDdQq]\w*o)(?:\s+(\w*me))?\b(.*)
参见regex proof。
解释
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[PpDdQq] any character of: 'P', 'p', 'D', 'd',
'Q', 'q'
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
o 'o'
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
me 'me'
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of
所以我有两个句子正在处理,我有兴趣根据单词中的字符制作特定的捕获组。所以我有这两个西班牙语句子:
- Yo quiero irme de viaje。
- Yo puedo caminar en la nieve.
第一个捕获组必须是其中一个动词,即。 “quiero”和“puedo”所以我用这个正则表达式 ([PpDdQq].*o)
.
第二个捕获组必须是紧跟在动词之后的单词,以“me”结尾,我用 (\w*me)
来做到这一点。
现在对于最后一个捕获组,在没有以“-me”结尾的直接单词的情况下,它必须是紧跟在第一个捕获组之后的所有单词和空格,或者在存在的情况下,必须是紧跟在第二个捕获组之后的所有单词和空格以“-me”结尾的直接词,我使用了 (\w.+)
但它没有用。
谁能帮我弄清楚为什么?谢谢。下面是完整的正则表达式和 link 到包含要匹配的表达式和示例的正则表达式网站:
([PpDdQq].*o) |(\w*me)|(\w.+)
使用
\b([PpDdQq]\w*o)(?:\s+(\w*me))?\b(.*)
参见regex proof。
解释
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[PpDdQq] any character of: 'P', 'p', 'D', 'd',
'Q', 'q'
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
o 'o'
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
(?: group, but do not capture (optional
(matching the most amount possible)):
--------------------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\w* word characters (a-z, A-Z, 0-9, _) (0
or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
me 'me'
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
)? end of grouping
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
.* any character except \n (0 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of