为什么堆非常适合合并排序流?
Why is heap well suited for merging sorted streams?
我在读一些算法书时遇到了这一行,
Heaps are well suited for algorithms that merge sorted data streams.
对于为什么会这样,没有任何解释。有人可以帮我解释这是为什么吗?
如果您只有两个数据流,那么您真的不需要堆,因为以下算法可以做到:
Let s1 ans s2 be the two streams
while s1.hasData() and s2.hasData()
if s1.peek() < s2.peek(): datum = s1.pop()
else: datum = s2.pop()
s.push(datum)
if either is non-empty (only one is), add the rest of its content to s
正如 Henk Holterman 所指出的,如果您有 k>2
个流,那么您可以通过堆实现合并(本质上是堆 now-complicated 决定在当前步骤中使用哪个流):
let H be a (min or max, depending on your needs) heap
let s1, s2, ..., sk be sorted streams
// fill the heap with the first elements from the streams (e.g. min/max elements from each stream, depending on how they're sorted)
for i=1 to k:
H.add((i, si.pop())) // we need to know which stream the element came from
let s be the initially-empty data stream which will contain the merged content in sorted order
// H.empty() will indicate that all streams are empty
while not H.empty():
// take the min/max element of the min/max elements of each stream (*the* min/max element)
(i, datum) = H.extract()
// add it to s
s.push(datum)
// we know the datum came from s[i]; thus we need to push the next element from the i-th stream into heap as it may contain the next min/max element (that is, if s[i] isn't empty)
if not s[i].empty():
// we'll assume the heap sorts based on the second component of the pair
H.add((i, s[i].pop())
// s is the sorted stream containing elements from s1,s2,...,sk
这个运行次是O((|s1|+...+|sk|) * log k)
,其中|si|
表示流中的元素个数si
。
关键思想是在 while
循环的每次迭代中添加 s[1].peek()
、s[2].peek()
、...、[=19 的 smallest/largest =] 到 s
。您可以使用堆来实现此目的,堆会告诉您哪个流当前包含 smallest/largest 元素。并注意如何优雅地处理流为空的边缘情况。
我在读一些算法书时遇到了这一行,
Heaps are well suited for algorithms that merge sorted data streams.
对于为什么会这样,没有任何解释。有人可以帮我解释这是为什么吗?
如果您只有两个数据流,那么您真的不需要堆,因为以下算法可以做到:
Let s1 ans s2 be the two streams
while s1.hasData() and s2.hasData()
if s1.peek() < s2.peek(): datum = s1.pop()
else: datum = s2.pop()
s.push(datum)
if either is non-empty (only one is), add the rest of its content to s
正如 Henk Holterman 所指出的,如果您有 k>2
个流,那么您可以通过堆实现合并(本质上是堆 now-complicated 决定在当前步骤中使用哪个流):
let H be a (min or max, depending on your needs) heap
let s1, s2, ..., sk be sorted streams
// fill the heap with the first elements from the streams (e.g. min/max elements from each stream, depending on how they're sorted)
for i=1 to k:
H.add((i, si.pop())) // we need to know which stream the element came from
let s be the initially-empty data stream which will contain the merged content in sorted order
// H.empty() will indicate that all streams are empty
while not H.empty():
// take the min/max element of the min/max elements of each stream (*the* min/max element)
(i, datum) = H.extract()
// add it to s
s.push(datum)
// we know the datum came from s[i]; thus we need to push the next element from the i-th stream into heap as it may contain the next min/max element (that is, if s[i] isn't empty)
if not s[i].empty():
// we'll assume the heap sorts based on the second component of the pair
H.add((i, s[i].pop())
// s is the sorted stream containing elements from s1,s2,...,sk
这个运行次是O((|s1|+...+|sk|) * log k)
,其中|si|
表示流中的元素个数si
。
关键思想是在 while
循环的每次迭代中添加 s[1].peek()
、s[2].peek()
、...、[=19 的 smallest/largest =] 到 s
。您可以使用堆来实现此目的,堆会告诉您哪个流当前包含 smallest/largest 元素。并注意如何优雅地处理流为空的边缘情况。