为什么这些算法的执行时间不同？

Question

关于leetcode的问题'Top K Frequent Elements' https://leetcode.com/problems/top-k-frequent-elements/submissions/

有一个解决方案可以在 88 毫秒内完成任务，我的解决方案在 124 毫秒内完成任务，我认为这是一个很大的差异。

我试图理解为什么购买文档不提供我使用的函数的实现方式，即 most_common()，如果我想深入挖掘这样的细节，这样我就可以写出运行未来如此之快的算法我应该阅读什么（特定书籍？或任何其他资源？）。

我的代码（124 毫秒）

def topKFrequent(self, nums, k):
        if k ==len(nums):
            return nums
        c=Counter(nums)
        return [ t[0] for t in c.most_common(k) ]

其他（88 毫秒）（及时）

def topKFrequent(self, nums, k):
        if k == len(nums):
            return nums
        count = Counter(nums)   
        return heapq.nlargest(k, count.keys(), key=count.get)

两者占用的内存量几乎相同，所以这里没有区别。

Answer 1

most_common 也使用 heapq.nlargest，但它使用 count.items() 而不是 count.keys() 来调用它。这会使它稍微慢一点，并且还需要创建一个新列表的开销，以便从 most_common().

返回的列表中的每个元素中提取 [0] 值

heapq.nlargest 版本只是避免了这种额外的开销，并将 count.keys() 作为第二个参数传递，因此不需要再次迭代该结果以将片段提取到新列表中。

Answer 2

@trincot 似乎已经回答了这个问题，但如果有人正在寻找一种更快的方法来做到这一点，那么请使用 Numpy，前提是 nums 可以存储为 np.array:

def topKFrequent_numpy(nums, k):
    unique, counts = np.unique(nums, return_counts=True)
    return unique[np.argsort(-counts)[:k]]

一次速度测试

nums_array = np.random.randint(1000, size=1000000)
nums_list = list(nums_array)

%timeit topKFrequent_Counter(nums_list, 500)
# 116 ms ± 4.18 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit topKFrequent_heapq(nums_list, 500)
# 117 ms ± 3.35 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit topKFrequent_numpy(nums_array, 500)
# 39.2 ms ± 185 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

（其他输入值的速度可能会有很大不同）

为什么这些算法的执行时间不同？

why these algorithms differ in their execution time?

python

algorithm

data-structures