将文本句子中的任何整数转换为 python 中的字符串

Question

我有一个系列看起来有点像这样

'0589 BTC: 581 OUTFLOW BANK REF: CUST REF: 0004'
'CUR FR 44F8 Availability: 12,267.24 Debited'
...

我想将所有整数/字母数字值替换为相应的字格式。

For eg.
 0589 --> ZERO FIVE EIGHT NINE
 44F8 --> FOUR FOUR F EIGHT
 12,267.24 --> ONE TWO , TWO SIX SEVEN . TWO FOUR

因此第一项将被转换为

'ZERO FIVE EIGHT NINE BTC: FIVE EIGHT ONE OUTFLOW BANK REF: CUST REF: ZERO ZERO ZERO FOUR'

等等。

解决这个问题的方法是什么，

我正在研究一些 python 包，例如 num2words 和 inflect，但它们都是 return 人类可读的格式即 22 --> 22 不满足我的要求

conversion_dict = {1:'One' , 2 : 'Two' , 3 : 'Three' , 4 : 'Four' , 5: 'Five' , 6:'Six' , 7 : 'Seven' , 8:'Eight' , 9:'Nine' , 0: 'Zero'}

Answer 1

你可以试试这个方法

使用正则表达式获取字符串中的所有数字 [\d]+
将匹配转换为整数
使用模 (%) 运算符从数字中获取数字
终于用你的字典查到了单词中的数字

替代解决方案

使用re.findall(r'[\d]', my_string)
这会给你所有可能的数字
接下来就用my_string.replace(digit, f' {conversion_dict[digit]} ' )

Answer 2

您可以遍历字符串并用相应的字符串替换每个数字，就像我在这里所做的那样：

conversion_dict = {'1':'One' , '2' : 'Two' , '3' : 'Three' , '4' : 'Four' , '5': 'Five' , '6':'Six' , '7' : 'Seven' , '8':'Eight' , '9':'Nine' , '0': 'Zero'}


def parser(string: str):
    def inner():
        for index, char in enumerate(string):
            yield " " + conversion_dict[char] + " " if char.isnumeric() else char

    result = "".join(inner()).lstrip()
    return "".join(s if not (result[i] == result[i - 1] == " ") else "" for i, s in enumerate(result))


string_for_example = """
0589 BTC: 581 OUTFLOW BANK REF: CUST REF: 0004
CUR FR 44F8 Availability: 12,267.24 Debited
"""

print(parser(string_for_example))

结果是：

Zero Five Eight Nine BTC: Five Eight One OUTFLOW BANK REF: CUST REF: Zero Zero Zero Four 
CUR FR Four Four F Eight Availability: One Two , Two Six Seven . Two Four Debited

Answer 3

# your pandas series
s = pd.Series(['0589 BTC: 581 OUTFLOW BANK REF: CUST REF: 0004', 
               'CUR FR 44F8 Availability: 12,267.24 Debited'], name='Text')
# your conversion dict with strings not ints
conversion_dict = {'1':'One ' , '2' : 'Two ' , '3' : 'Three ' , '4' : 'Four ' ,
                   '5': 'Five ' , '6':'Six ' , '7' : 'Seven ' , '8':'Eight ' ,
                   '9':'Nine ' , '0': 'Zero '}   
# use replace with regex set to true and then replace duplicate spaces between words
s.replace(conversion_dict, regex=True).replace(' +', ' ', regex=True).str.rstrip()

['Zero Five Eight Nine BTC: Five Eight One OUTFLOW BANK REF: CUST REF: Zero Zero Zero Four'
 'CUR FR Four Four FEight Availability: One Two ,Two Six Seven .Two Four Debited']

Answer 4

为了给我 2 美分，这里有一个可能的解决方案：

import re
def num2digit(text):
    mapper = {
        '0': 'ZERO ',
        '1': 'ONE ',
        '2': 'TWO ',
        '3': 'THREE ',
        '4': 'FOUR ',
        '5': 'FIVE ',
        '6': 'SIX ',
        '7': 'SEVEN ',
        '8': 'EIGHT ',
        '9': 'NINE ',
    }
    for k, v in mapper.iteritems():
        text = text.replace(k, v)
    return re.sub(' +', ' ', text).strip()

然后你可以这样称呼它：

>>> num2digit('0589 BTC: 581 OUTFLOW BANK REF: CUST REF: 0004')
'ZERO FIVE EIGHT NINE BTC: FIVE EIGHT ONE OUTFLOW BANK REF: CUST REF: ZERO ZERO ZERO FOUR'

为了解释它所做的是用它的映射名称替换每个数字，然后在名称后添加一个 space 以根据需要分隔每个单词，然后删除可能的双白色 spaces然后，最后，删除可能的尾随白色spaces.

将文本句子中的任何整数转换为 python 中的字符串

convert any integer in a textual sentence to string in python

python

text

nlp