Python:获取以特定单词开头和结尾的 DataFrame 字符串对象中的数字
Python: Get numeric in a DataFrame String Object which start and end with specific word
我想要 total 和 USD.
之间出现的总费用的字符串数字作为整数
示例数据帧:
id name lastname message
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD
所以最后的结果应该是:
id name lastname message total
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD 10
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD 0
您可以使用正则表达式来捕获出现在“总计”和“美元”之间的任何数字。
下面的代码将捕获任何数字(第一个如果是多个,如果应该接受浮点数则需要一些调整但是因为类型应该是 int 所以应该不需要)并将其转换为 int 类型。
df['total'] = df['message'].str.extract('total.*?(\d+).*?USD').fillna(0).astype(int)
结果:
id name lastname message total
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD 10
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD 0
我想要 total 和 USD.
之间出现的总费用的字符串数字作为整数示例数据帧:
id name lastname message
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD
所以最后的结果应该是:
id name lastname message total
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD 10
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD 0
您可以使用正则表达式来捕获出现在“总计”和“美元”之间的任何数字。
下面的代码将捕获任何数字(第一个如果是多个,如果应该接受浮点数则需要一些调整但是因为类型应该是 int 所以应该不需要)并将其转换为 int 类型。
df['total'] = df['message'].str.extract('total.*?(\d+).*?USD').fillna(0).astype(int)
结果:
id name lastname message total
0 1 John Doe John have 100 USD, so he buy 5 eggs which total cost 10 USD 10
1 2 Mar Aye Mar have 10 USD, he just buy a banana from another shop for 16 USD 0