在 Pandas 中正则表达式匹配的开头添加一个字符

Question

我有一个包含两列的数据框，id 和 text

df = pd.DataFrame([[1, 'Hello world 28'], [2, 'Hi how are you 9'], [3, '19 Hello']], columns=['id','text'])

   id   text
    1   Hello world 28
    2   Hi how are you 9
    3   19 Hello

在text字段中，只要有一个数字前面有一个space，我想在该数字前添加一个#。我正在寻找的结果数据框如下：

   id   text
    1   Hello world #28
    2   Hi how are you #9
    3   19 Hello

我尝试了以下方法来捕获正则表达式模式，并按照此 link 中的示例在数字前添加 # 字符：

df['text'] = df['text'].replace(r'(\s\d{1,2})', "#", regex=True)

但是，这给了我以下结果，它用 # 替换了整个数字，而不是在正则表达式匹配的开头添加它：

   id   text
    1   Hello world #
    2   Hi how are you #
    3   19 Hello

关于如何在正则表达式匹配之前添加 # 字符的任何指示？谢谢！

Answer 1

尝试

df['text'].replace(r"\s(\d{1,2})", r" #", regex=True)

即移动括号以包围数字部分以捕获要反映在 </code> 中的数字，并通过 <code>r 使替换字符串原始以转义 </code> 中的斜线（以及在 <code>#)

之前放一个 space

Answer 2

以防万一你想保留原始空白使用 2 组：

df['text'].replace(r"(\s)(\d{1,2})", r"#", regex=True)

参见proof。

解释

--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \s                       whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
  )                        end of 
--------------------------------------------------------------------------------
  (                        group and capture to :
--------------------------------------------------------------------------------
    \d{1,2}                  digits (0-9) (between 1 and 2 times
                             (matching the most amount possible))
--------------------------------------------------------------------------------
  )                        end of

在 Pandas 中正则表达式匹配的开头添加一个字符

Add a character at start of a regex match in Pandas

python

regex

regex-group

pandas