用于检测模式并从 Python 中的该模式中删除空格的正则表达式

Question

我有一个文件，其中包含按以下格式 <+segment1 segment2 segment3 segment4+> 组成单词的片段，我想要的是一个输出，所有片段彼此并排形成一个单词（所以基本上我想要删除段之间的 space 和段周围的 <+ +> 符号）。例如：

输入：

<+play ing+> <+game s .+>

输出：

playing games.

我首先尝试使用 \<\+(.*?)\+\> 检测模式，但我似乎不知道如何删除 spaces

Answer 1

使用这个Python code:

import re
line = '<+play ing+> <+game s .+>'
line = re.sub(r'<\+\s*(.*?)\s*\+>', lambda z: z.group(1).replace(" ", ""), line)
print(line)

结果：playing games.

lambda 额外删除空格。

正则表达式解释

--------------------------------------------------------------------------------
  <                        '<'
--------------------------------------------------------------------------------
  \+                       '+'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    .*?                      any character except \n (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \+                       '+'
--------------------------------------------------------------------------------
  >                        '>'

Answer 2

我假设 spaces 可以转换为空字符串，除非它们前面有 '>' 和后面有 '<' .即字符串'> <'中的space不要替换为空字符串

您可以用空字符串替换以下正则表达式的每个匹配项：

<\+|\+>|(?<!>) | (?!<)

Regex demo_{^<¯\(ツ)/¯^>}Python code

这个表达式可以分解如下。

<\+     # Match '<+'
|       # or
\+>     # Match '<+'
|       # or
(?<!>)  # Negative lookbehind asserts current location is not preceded by '>'
[ ]     # Match a space
|       # or
[ ]     # Match a space
(?!<)   # Negative lookahead asserts current location is not followed by '<'

我已将每个 space 放在上面的一个字符 class 中，因此它是可见的。

用于检测模式并从 Python 中的该模式中删除空格的正则表达式

Regex to detect pattern and remove spaces from that pattern in Python

regex

text

python-3.x