使用 Regex 仅捕获特定的 sections/patterns 字符串
Capturing only specific sections/patterns of string with Regex
我有以下字符串,它们始终遵循标准格式:
'On 10/31/2018, Sally Brown picked 25 apples at the orchard.'
'On 11/01/2018, John Smith picked 12 peaches at the orchard.'
'On 09/15/2018, Jim Roe picked 10 pears at the orchard.'
我想将某些数据字段提取到一系列列表中:
['10/31/2018','Sally Brown','25','apples']
['11/01/2018','John Smith','12','peaches']
['09/15/2018','Jim Roe','10','pears']
如您所见,我需要识别一些句子结构,但不捕获,因此程序具有数据所在位置的上下文。我认为可行的正则表达式是:
(?<=On\s)\d{2}\/\d{2}\/\d{4},\s(?=[A-Z][a-z]+\s[A-Z][a-z]+)\s.+?(?=\d+)\s(?=[a-z]+)\sat\sthe\sorchard\.
当然,这是不正确的。
这对某些人来说可能是一个简单的问题,但我找不到答案。提前致谢,等我技术更熟练的时候,我会在这里支付。
使用\w+
匹配任何单词或[a-zA-Z0-9_]
import re
str = ''''On 10/31/2018, Sally Brown picked 25 apples at the orchard.'
'On 11/01/2018, John Smith picked 12 peaches at the orchard.'
'On 09/15/2018, Jim Roe picked 10 pears at the orchard.'''
arr = re.findall('On\s(.*?),\s(\w+\s\w+)\s\w+\s(\d+)\s(\w+)', str)
print arr
# [('10/31/2018', 'Sally Brown', '25', 'apples'),
# ('11/01/2018', 'John Smith', '12', 'peaches'),
# ('09/15/2018', 'Jim Roe', '10', 'pears')]
我有以下字符串,它们始终遵循标准格式:
'On 10/31/2018, Sally Brown picked 25 apples at the orchard.'
'On 11/01/2018, John Smith picked 12 peaches at the orchard.'
'On 09/15/2018, Jim Roe picked 10 pears at the orchard.'
我想将某些数据字段提取到一系列列表中:
['10/31/2018','Sally Brown','25','apples']
['11/01/2018','John Smith','12','peaches']
['09/15/2018','Jim Roe','10','pears']
如您所见,我需要识别一些句子结构,但不捕获,因此程序具有数据所在位置的上下文。我认为可行的正则表达式是:
(?<=On\s)\d{2}\/\d{2}\/\d{4},\s(?=[A-Z][a-z]+\s[A-Z][a-z]+)\s.+?(?=\d+)\s(?=[a-z]+)\sat\sthe\sorchard\.
当然,这是不正确的。
这对某些人来说可能是一个简单的问题,但我找不到答案。提前致谢,等我技术更熟练的时候,我会在这里支付。
使用\w+
匹配任何单词或[a-zA-Z0-9_]
import re
str = ''''On 10/31/2018, Sally Brown picked 25 apples at the orchard.'
'On 11/01/2018, John Smith picked 12 peaches at the orchard.'
'On 09/15/2018, Jim Roe picked 10 pears at the orchard.'''
arr = re.findall('On\s(.*?),\s(\w+\s\w+)\s\w+\s(\d+)\s(\w+)', str)
print arr
# [('10/31/2018', 'Sally Brown', '25', 'apples'),
# ('11/01/2018', 'John Smith', '12', 'peaches'),
# ('09/15/2018', 'Jim Roe', '10', 'pears')]