如何在正则表达式中动态捕获一行文本中的两个日期?
How do I dynamically capture in regex two dates from one line of text?
我有一个每周都会更改的文本:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
我正在寻找第 1 年和第 2 年的正则表达式模式。
(两者都会每周更改,所以我需要公式来捕获所有月、日、年)
我的输出应该是这样的:
2015 = November 5, 2015
2016 = November 3, 2016
我使用的框架不允许正则表达式捕获组或拆分,因此我需要专门用于此类字符串的公式。
谢谢!
你可以试试这个:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
final_data = sorted(["{} = {}".format(re.findall("\d+$", i)[0], i) for i in re.findall("[a-zA-Z]+\s\d+,\s\d+", text)], key=lambda x:int(re.findall("^\d+", x)[0]))
输出:
['2015 = November 5, 2015', '2016 = November 3, 2016']
使用@ctwheels 正则表达式:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
result = [(date.split(",")[1].strip(), date) for date in re.findall(r'\w+\s+\d+,\s*\d+', text)]
print(result)
# [('2016', 'November 3, 2016'), ('2015', 'November 5, 2015')]
代码
根据我原来的评论
(\w+\s+\d+,\s*(\d+))
注意:上面的正则表达式和regex101上的正则表达式不匹配。这是故意的。 Regex101 只能演示替换的输出,因此我在正则表达式前面添加了 .*?
以正确显示预期的输出。
结果
输入
Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015
输出
2016 = November 3, 2016
2015 = November 5, 2015
用法
import re
regex = r"(\w+\s+\d+,\s*(\d+))"
str = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
for (date, year) in re.findall(regex, str):
print year + ' = ' + date
我有一个每周都会更改的文本:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
我正在寻找第 1 年和第 2 年的正则表达式模式。
(两者都会每周更改,所以我需要公式来捕获所有月、日、年)
我的输出应该是这样的:
2015 = November 5, 2015
2016 = November 3, 2016
我使用的框架不允许正则表达式捕获组或拆分,因此我需要专门用于此类字符串的公式。
谢谢!
你可以试试这个:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
final_data = sorted(["{} = {}".format(re.findall("\d+$", i)[0], i) for i in re.findall("[a-zA-Z]+\s\d+,\s\d+", text)], key=lambda x:int(re.findall("^\d+", x)[0]))
输出:
['2015 = November 5, 2015', '2016 = November 3, 2016']
使用@ctwheels 正则表达式:
text = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
import re
result = [(date.split(",")[1].strip(), date) for date in re.findall(r'\w+\s+\d+,\s*\d+', text)]
print(result)
# [('2016', 'November 3, 2016'), ('2015', 'November 5, 2015')]
代码
根据我原来的评论
(\w+\s+\d+,\s*(\d+))
注意:上面的正则表达式和regex101上的正则表达式不匹配。这是故意的。 Regex101 只能演示替换的输出,因此我在正则表达式前面添加了 .*?
以正确显示预期的输出。
结果
输入
Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015
输出
2016 = November 3, 2016
2015 = November 5, 2015
用法
import re
regex = r"(\w+\s+\d+,\s*(\d+))"
str = "Weekly Comparison, Week 50 October 28 - November 3, 2016 October 30 - November 5, 2015"
for (date, year) in re.findall(regex, str):
print year + ' = ' + date