如何从 Python 中的列表中提取多个时间戳对

Question

我已经从脚本文件中提取了所有时间戳。输出如下所示：

('[, 00:00:03,950, 00:00:06,840, 00:00:06,840, 00:00:09,180, 00:00:09,180, '
 '00:00:10,830, 00:00:10,830, 00:00:14,070, 00:00:14,070, 00:00:16,890, '
 '00:00:16,890, 00:00:19,080, 00:00:19,080, 00:00:21,590, 00:00:21,590, '
 '00:00:24,030, 00:00:24,030, 00:00:26,910, 00:00:26,910, 00:00:29,640, '
 '00:00:29,640, 00:00:31,920, 00:00:31,920, 00:00:35,850, 00:00:35,850, '
 '00:00:38,629, 00:00:38,629, 00:00:40,859, 00:00:40,859, 00:00:43,170, '
 '00:00:43,170, 00:00:45,570, 00:00:45,570, 00:00:48,859, 00:00:48,859, '
 '00:00:52,019, 00:00:52,019, 00:00:54,449, 00:00:54,449, 00:00:57,210, '
 '00:00:57,210, 00:00:59,519, 00:00:59,519, 00:01:02,690, 00:01:02,690, '
 '00:01:05,820, 00:01:05,820, 00:01:08,549, 00:01:08,549, 00:01:10,490, '
 '00:01:10,490, 00:01:13,409, 00:01:13,409, 00:01:16,409, 00:01:16,409, '
 '00:01:18,149, 00:01:18,149, 00:01:20,340, 00:01:20,340, 00:01:22,649, '
 '00:01:22,649, 00:01:26,159, 00:01:26,159, 00:01:28,740, 00:01:28,740, '
 '00:01:30,810, 00:01:30,810, 00:01:33,719, 00:01:33,719, 00:01:36,990, '
 '00:01:36,990, 00:01:39,119, 00:01:39,119, 00:01:41,759, 00:01:41,759, '
 '00:01:43,799, 00:01:43,799, 00:01:46,619, 00:01:46,619, 00:01:49,140, '
 '00:01:49,140, 00:01:51,240, 00:01:51,240, 00:01:53,759, 00:01:53,759, '
 '00:01:56,460, 00:01:56,460, 00:01:58,740, 00:01:58,740, 00:02:01,640, '
 '00:02:01,640, 00:02:04,409, 00:02:04,409, 00:02:07,229, 00:02:07,229, '
 '00:02:09,380, 00:02:09,380, 00:02:12,060, 00:02:12,060, 00:02:14,840, ]')

在此输出中，总是有时间戳对，即总是有 2 个连续的时间戳在一起，例如：00:00:03,950 和 00:00:06,840、00:00:06,840 和 00:00:09,180，等等

现在，我想分别提取所有这些时间戳对，以便输出如下所示：

00:00:03,950 - 00:00:06,840

00:00:06,840 - 00:00:09,180

00:00:09,180 - 00:00:10,830

等等

目前，我有以下（非常不方便）的解决方案来解决我的问题：

# get first part of first timestamp
a = res_timestamps[2:15]
print(dedent(a))

# get second part of first timestamp
b = res_timestamps[17:29]
print(b)

# combine timestamp parts
c = a + ' - ' + b
print(dedent(c))

当然，这很糟糕，因为我无法手动提取所有成绩单的索引。尝试使用循环还没有奏效，因为每个项目不是时间戳而是单个字符。

我的问题有优雅的解决方案吗？

感谢任何帮助或提示。

非常感谢您！

Answer 1

正则表达式来拯救！

完美适用于您的示例数据的解决方案：

import re
from pprint import pprint

pprint(re.findall(r"(\d{2}:\d{2}:\d{2},\d{3}), (\d{2}:\d{2}:\d{2},\d{3})", your_data))

这会打印：

[('00:00:03,950', '00:00:06,840'),
 ('00:00:06,840', '00:00:09,180'),
 ('00:00:09,180', '00:00:10,830'),
 ('00:00:10,830', '00:00:14,070'),
 ('00:00:14,070', '00:00:16,890'),
 ('00:00:16,890', '00:00:19,080'),
 ('00:00:19,080', '00:00:21,590'),
 ('00:00:21,590', '00:00:24,030'),
 ('00:00:24,030', '00:00:26,910'),
 ('00:00:26,910', '00:00:29,640'),
 ('00:00:29,640', '00:00:31,920'),
 ('00:00:31,920', '00:00:35,850'),
 ('00:00:35,850', '00:00:38,629'),
 ('00:00:38,629', '00:00:40,859'),
 ('00:00:40,859', '00:00:43,170'),
 ('00:00:43,170', '00:00:45,570'),
 ('00:00:45,570', '00:00:48,859'),
 ('00:00:48,859', '00:00:52,019'),
 ('00:00:52,019', '00:00:54,449'),
 ('00:00:54,449', '00:00:57,210'),
 ('00:00:57,210', '00:00:59,519'),
 ('00:00:59,519', '00:01:02,690'),
 ('00:01:02,690', '00:01:05,820'),
 ('00:01:05,820', '00:01:08,549'),
 ('00:01:08,549', '00:01:10,490'),
 ('00:01:10,490', '00:01:13,409'),
 ('00:01:13,409', '00:01:16,409'),
 ('00:01:16,409', '00:01:18,149'),
 ('00:01:18,149', '00:01:20,340'),
 ('00:01:20,340', '00:01:22,649'),
 ('00:01:22,649', '00:01:26,159'),
 ('00:01:26,159', '00:01:28,740'),
 ('00:01:28,740', '00:01:30,810'),
 ('00:01:30,810', '00:01:33,719'),
 ('00:01:33,719', '00:01:36,990'),
 ('00:01:36,990', '00:01:39,119'),
 ('00:01:39,119', '00:01:41,759'),
 ('00:01:41,759', '00:01:43,799'),
 ('00:01:43,799', '00:01:46,619'),
 ('00:01:46,619', '00:01:49,140'),
 ('00:01:49,140', '00:01:51,240'),
 ('00:01:51,240', '00:01:53,759'),
 ('00:01:53,759', '00:01:56,460'),
 ('00:01:56,460', '00:01:58,740'),
 ('00:01:58,740', '00:02:01,640'),
 ('00:02:01,640', '00:02:04,409'),
 ('00:02:04,409', '00:02:07,229'),
 ('00:02:07,229', '00:02:09,380'),
 ('00:02:09,380', '00:02:12,060'),
 ('00:02:12,060', '00:02:14,840')]

您可以像这样以您想要的格式输出：

for start, end in timestamps:
    print(f"{start} - {end}")

Answer 2

这是一个没有正则表达式的解决方案
清理字符串，并在 ', ' 上拆分以创建列表
使用字符串切片 select 奇数和偶数并将它们压缩在一起。

# give data as your string

# convert data into a list by removing end brackets and spaces, and splitting
data = data.replace('[, ', '').replace(', ]', '').split(', ')

# use list slicing and zip the two components
combinations = list(zip(data[::2], data[1::2]))

# print the first 5
print(combinations[:5])
[out]:

[('00:00:03,950', '00:00:06,840'),
 ('00:00:06,840', '00:00:09,180'),
 ('00:00:09,180', '00:00:10,830'),
 ('00:00:10,830', '00:00:14,070'),
 ('00:00:14,070', '00:00:16,890')]

如何从 Python 中的列表中提取多个时间戳对

How to extract several timestamp pairs from a list in Python

indexing

extract

python-3.x