按搜索算法对数据进行分组

Grouping Data by Search Algorithms

我在 Python 中有一个示例数据集,其中每条数据有 3 个值:

[ 字符串日期,整数 24 小时时间(前两个数字 = 小时,后两个数字 = 分钟),整数持续时间(始终为 15 分钟)]

我的目标是将具有相同日期且具有相邻 24 小时时间的数据分组。 24 小时时间值在相邻位置以 15 分钟的间隔分隔。最终,将具有相邻时间的数据片段分组将导致持续时间增加,无论分组的 15 分钟间隔有多少。我在下面提供了列表 final_dataset 以更好地表示最终数据集的外观。

我测试了一些代码以线性搜索 initial_dataset。这是粗略的伪代码:

# -- Start at first data piece (call this previous)
    # -- Check next data piece (call this current)
    # -- Subtract 24hr time values for current and previous
    # -- If difference is 15, append to a separate list the combined data piece
         # Check next data piece (call this next)
         # Subtract 24hr time values for next and current
         # Repeat
                  # Check next data piece (call this next next)
                  # Repeat this linear iteration until the difference > 15
                  # Store last position of no adjacency
# -- Continue at the last position of no adjacency and repeat this entire process until end of initial_dataset is reached

通过数据结构或搜索算法,是否有更有效的方法来实现这一目标?

# -- Example Dataset
initial_dataset = [ ['July 26, 2021',  1000,  15],
                    ['July 26, 2021',  1015,  15],
                    ['July 26, 2021',  1030,  15],
                    ['July 26, 2021',  1045,  15],
                    ['July 26, 2021',  1500,  15],
                    ['July 27, 2021',  1400,  15], ]

final_dataset = [ ['July 26, 2021', 1000, 60], 
                  ['July 26, 2021', 1500, 15]
                  ['July 27, 2021', 1400, 15] ]

通过使用 collections.defaultdict,分组时只需要对您的数据进行一次传递:

import collections
data = [['July 26, 2021', 1000, 15], ['July 26, 2021', 1015, 15], ['July 26, 2021', 1030, 15], ['July 26, 2021', 1045, 15], ['July 26, 2021', 1500, 15], ['July 27, 2021', 1400, 15]]
d = collections.defaultdict(dict)
for a, b, c in data:
   if (v:=int(b/100)) in d[a]:
      d[a][v] += c
   else:
      d[a][v] = c

result = [[a, j*100, k] for a, b in d.items() for j, k in b.items()]

输出:

[['July 26, 2021', 1000, 60], ['July 26, 2021', 1500, 15], ['July 27, 2021', 1400, 15]]