如何使用正则表达式在一两个匹配组之后捕获句子的其余部分？

Question

所以我有两个句子正在处理，我有兴趣根据单词中的字符制作特定的捕获组。所以我有这两个西班牙语句子：

Yo quiero irme de viaje。
Yo puedo caminar en la nieve.

第一个捕获组必须是其中一个动词，即。 “quiero”和“puedo”所以我用这个正则表达式 ([PpDdQq].*o).
第二个捕获组必须是紧跟在动词之后的单词，以“me”结尾，我用 (\w*me) 来做到这一点。
现在对于最后一个捕获组，在没有以“-me”结尾的直接单词的情况下，它必须是紧跟在第一个捕获组之后的所有单词和空格，或者在存在的情况下，必须是紧跟在第二个捕获组之后的所有单词和空格以“-me”结尾的直接词，我使用了 (\w.+) 但它没有用。

谁能帮我弄清楚为什么？谢谢。下面是完整的正则表达式和 link 到包含要匹配的表达式和示例的正则表达式网站：

([PpDdQq].*o) |(\w*me)|(\w.+)

Answer 1

使用

\b([PpDdQq]\w*o)(?:\s+(\w*me))?\b(.*)

参见regex proof。

解释

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    [PpDdQq]                 any character of: 'P', 'p', 'D', 'd',
                             'Q', 'q'
--------------------------------------------------------------------------------
    \w*                      word characters (a-z, A-Z, 0-9, _) (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    o                        'o'
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    (                        group and capture to :
--------------------------------------------------------------------------------
      \w*                      word characters (a-z, A-Z, 0-9, _) (0
                               or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
      me                       'me'
--------------------------------------------------------------------------------
    )                        end of 
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .*                       any character except \n (0 or more times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of

如何使用正则表达式在一两个匹配组之后捕获句子的其余部分？

How to capture the rest of a sentence after one or two matching groups with regex?

python

regex

python-re