使用 Regex 仅捕获特定的 sections/patterns 字符串

Question

我有以下字符串，它们始终遵循标准格式：

'On 10/31/2018, Sally Brown picked 25 apples at the orchard.'
'On 11/01/2018, John Smith picked 12 peaches at the orchard.'
'On 09/15/2018, Jim Roe picked 10 pears at the orchard.'

我想将某些数据字段提取到一系列列表中：

['10/31/2018','Sally Brown','25','apples']
['11/01/2018','John Smith','12','peaches']
['09/15/2018','Jim Roe','10','pears']

如您所见，我需要识别一些句子结构，但不捕获，因此程序具有数据所在位置的上下文。我认为可行的正则表达式是：

(?<=On\s)\d{2}\/\d{2}\/\d{4},\s(?=[A-Z][a-z]+\s[A-Z][a-z]+)\s.+?(?=\d+)\s(?=[a-z]+)\sat\sthe\sorchard\.

当然，这是不正确的。

这对某些人来说可能是一个简单的问题，但我找不到答案。提前致谢，等我技术更熟练的时候，我会在这里支付。

Answer 1

使用\w+匹配任何单词或[a-zA-Z0-9_]

import re

str = ''''On 10/31/2018, Sally Brown picked 25 apples at the orchard.'
'On 11/01/2018, John Smith picked 12 peaches at the orchard.'
'On 09/15/2018, Jim Roe picked 10 pears at the orchard.'''

arr = re.findall('On\s(.*?),\s(\w+\s\w+)\s\w+\s(\d+)\s(\w+)', str)
print arr

# [('10/31/2018', 'Sally Brown', '25', 'apples'),
# ('11/01/2018', 'John Smith', '12', 'peaches'),
# ('09/15/2018', 'Jim Roe', '10', 'pears')]

使用 Regex 仅捕获特定的 sections/patterns 字符串

Capturing only specific sections/patterns of string with Regex

python

regex

python-3.7