按搜索算法对数据进行分组

Question

我在 Python 中有一个示例数据集，其中每条数据有 3 个值：

[ 字符串日期，整数 24 小时时间（前两个数字 = 小时，后两个数字 = 分钟），整数持续时间（始终为 15 分钟）]

我的目标是将具有相同日期且具有相邻 24 小时时间的数据分组。 24 小时时间值在相邻位置以 15 分钟的间隔分隔。最终，将具有相邻时间的数据片段分组将导致持续时间增加，无论分组的 15 分钟间隔有多少。我在下面提供了列表 final_dataset 以更好地表示最终数据集的外观。

我测试了一些代码以线性搜索 initial_dataset。这是粗略的伪代码：

# -- Start at first data piece (call this previous)
    # -- Check next data piece (call this current)
    # -- Subtract 24hr time values for current and previous
    # -- If difference is 15, append to a separate list the combined data piece
         # Check next data piece (call this next)
         # Subtract 24hr time values for next and current
         # Repeat
                  # Check next data piece (call this next next)
                  # Repeat this linear iteration until the difference > 15
                  # Store last position of no adjacency
# -- Continue at the last position of no adjacency and repeat this entire process until end of initial_dataset is reached

通过数据结构或搜索算法，是否有更有效的方法来实现这一目标？

# -- Example Dataset
initial_dataset = [ ['July 26, 2021',  1000,  15],
                    ['July 26, 2021',  1015,  15],
                    ['July 26, 2021',  1030,  15],
                    ['July 26, 2021',  1045,  15],
                    ['July 26, 2021',  1500,  15],
                    ['July 27, 2021',  1400,  15], ]

final_dataset = [ ['July 26, 2021', 1000, 60], 
                  ['July 26, 2021', 1500, 15]
                  ['July 27, 2021', 1400, 15] ]

Answer 1

通过使用 collections.defaultdict，分组时只需要对您的数据进行一次传递：

import collections
data = [['July 26, 2021', 1000, 15], ['July 26, 2021', 1015, 15], ['July 26, 2021', 1030, 15], ['July 26, 2021', 1045, 15], ['July 26, 2021', 1500, 15], ['July 27, 2021', 1400, 15]]
d = collections.defaultdict(dict)
for a, b, c in data:
   if (v:=int(b/100)) in d[a]:
      d[a][v] += c
   else:
      d[a][v] = c

result = [[a, j*100, k] for a, b in d.items() for j, k in b.items()]

输出：

[['July 26, 2021', 1000, 60], ['July 26, 2021', 1500, 15], ['July 27, 2021', 1400, 15]]

按搜索算法对数据进行分组

Grouping Data by Search Algorithms

python

algorithm

search

python-3.x