正则表达式货币 Python 3.5

Question

我正在尝试重新格式化文本数据中的欧元货币。原始格式是这样的：EUR 3.000.00 或者 EUR 33.540.000.- .
我想将格式标准化为 €3000.00 或 €33540000.00。
我已重新格式化 2.500 欧元。- 使用此代码成功：

import re
format1 = "a piece of text with currency EUR 2.500.- and some other information"
regexObj = re.compile(r'EUR\s*\d{1,3}[.](\d{3}[.]-)')
text1 = regexObj.sub(lambda m:"\u20ac"+"{:0.2f}".format(float(re.compile('\d+(.\d+)?(\.\d+)?').search(m.group().replace('.','')).group())),format1)
Out: "a piece of text with currency €2500.00 and some other information"

这给了我 2500.00 欧元，这是正确的。我试过将相同的逻辑应用于其他格式但无济于事。

format2 = "another piece of text EUR 3.000.00 and EUR 5.000.00. New sentence"
regexObj = re.compile('\d{1,3}[.](\d{3}[.])(\d{2})?')
text2 = regexObj.sub(lambda m:"\u20ac"+"{:0.2f}".format(float(re.compile('\d+(.\d+)?(\.\d+)?').search(m.group().replace('.','')).group())),format2)
Out: "another piece of text EUR €300000.00 and EUR €500000.00. New sentence"

和

format3 = "another piece of text EUR 33.540.000.- and more text"
regexObj = regexObj = re.compile(r'EUR\s*\d{1,3}[.](\d{3}[.])(\d{3}[.])(\d{3}[.]-)')
text3 = regexObj.sub(lambda m:"\u20ac"+"{:0.2f}".format(float(re.compile('\d+(.\d+)?(.\d+)?').search(m.group().replace('.','')).group())),format3)
Out: "another piece of text EUR 33.540.000.- and more text"

我认为问题可能出在 regexObj.sub() 上，因为它的 .format() 部分让我感到困惑。我试图在其中更改 re.compile('\d+(.\d+)?(.\d+)?') ，但我似乎无法生成我想要的结果。非常感谢任何想法。谢谢！

Answer 1

让我们从正则表达式开始。我的建议是：

EUR\s*(?:(\d{1,3}(?:\.\d{3})*)\.-|(\d{1,3}(?:\.\d{3})*)(\.\d{2}))

详情：

EUR\s* - 开始部分。
(?: - 非捕获组的开始 - 替代品的容器。
( - 捕获组 #1 的开始（整数部分用“.-”代替小数部分）。
\d{1,3} - 最多 3 位数字。
(?:\.\d{3})* - “.ddd”部分，0 次或多次。
) - 第 1 组结束。
\.- - “.-”结尾。
| - 备用分隔符。
( - 捕获组 #2 的开始（整数部分）
\d{1,3}(?:\.\d{3})* - 与备选方案 1 相同。
) - 第 2 组结束。
(\.\d{2}) - 捕获第 3 组（点和小数部分）。
) - 非捕获组结束。

我使用 "ordinary" 复制函数而不是 lambda 函数，我称之为repl。它包含 2 个部分，第 1 组和第 2 + 3 组。

在两个变体中，整数部分的点都被删除，但是 "final" 点（在整数部分之后）是第 3 组的一部分，因此不会被删除。

所以整个脚本如下所示：

import re

def repl(m):
    g1 = m.group(1)
    if g1:   # Alternative 1: ".-" instead of decimal part
        res = g1.replace('.','') + '.00'
    else:    # Alternative 2: integet part (group 2) + decimal part (group 3)
        res = m.group(2).replace('.','') + m.group(3)
    return "\u20ac" + res

# Source string
src = 'xxx EUR 33.540.000.- yyyy EUR 3.000.00 zzzz EUR 5.210.15 vvvv'
# Regex
pat = re.compile(r'EUR\s*(?:(\d{1,3}(?:\.\d{3})*)\.-|(\d{1,3}(?:\.\d{3})*)(\.\d{2}))')
# Replace
result = pat.sub(repl, src)

结果是：

xxx €33540000.00 yyyy €3000.00 zzzz €5210.15 vvvv

如您所见，无需使用 float 或 format。

正则表达式货币 Python 3.5

Regex currency Python 3.5

regex

currency

python-3.x