使用正则表达式替换任何数字之前的单词

Question

我想将字符串中出现在数字之前的单词和空格替换为空。例如，对于字符串 = 'Juice of 1/2'，我想 return '1/2'。我尝试了以下方法，但没有用。

string = "Juice of 1/2"
new = string.replace(r"^.+?(?=\d)", "")

我还尝试使用以下代码对列列表的每个单元格执行此操作。我如何将新的正则表达式模式合并到 r"(|)| 的现有模式中？

df[pd.Index(cols2) + "_clean"] = (
    df[cols2]
    .apply(lambda col: col.str.replace(r"\(|\)|,", "", regex=True))

)

Answer 1

您可以使用 str.extract:

来表达这一点

df["col2"] = df["col2"].str.extract(r'([0-9/-]+)')

Answer 2

.+? 将匹配任何内容，包括其他数字。它还将匹配 1/2 中的 /。由于您只想替换字母和空格，请使用 [a-z\s]+.

您还必须使用 re.sub()，而不是 string.replace()（在 Pandas 中，.str.replace() 默认处理正则表达式）。

new = re.sub(r'[a-z\s]+(?=\d)', '', string, flags=re.I)

Answer 3

可能像这样的方法可行。

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"[A-Za-z\s]+"

test_str = "Juice of 1/2 hede"

subst = ""

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0)

if result:
    print (result)

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

使用正则表达式替换任何数字之前的单词

Use regex to replace words before any digit with nothing

python

regex