Python：获取以特定单词开头和结尾的 DataFrame 字符串对象中的数字

Question

我想要 total 和 USD.

之间出现的总费用的字符串数字作为整数

示例数据帧：

    id   name    lastname   message 
0   1   John    Doe        John have 100 USD, so he buy 5 eggs which total cost 10 USD
1   2   Mar     Aye        Mar have 10 USD, he just buy a banana from another shop for 16 USD

所以最后的结果应该是：

    id   name    lastname   message                                                             total
0   1   John    Doe        John have 100 USD, so he buy 5 eggs which total cost 10 USD         10
1   2   Mar     Aye        Mar have 10 USD, he just buy a banana from another shop for 16 USD  0

Answer 1

您可以使用正则表达式来捕获出现在“总计”和“美元”之间的任何数字。

下面的代码将捕获任何数字（第一个如果是多个，如果应该接受浮点数则需要一些调整但是因为类型应该是 int 所以应该不需要）并将其转换为 int 类型。

df['total'] = df['message'].str.extract('total.*?(\d+).*?USD').fillna(0).astype(int)

结果：

id   name    lastname   message                                                             total
0   1   John    Doe        John have 100 USD, so he buy 5 eggs which total cost 10 USD         10
1   2   Mar     Aye        Mar have 10 USD, he just buy a banana from another shop for 16 USD  0

Python：获取以特定单词开头和结尾的 DataFrame 字符串对象中的数字

Python: Get numeric in a DataFrame String Object which start and end with specific word

python

substring

tokenize

nltk

pandas