字典迭代速度

Question

我目前正在学习 Python，并且对循环浏览词典时的迭代速度感到困惑。在其中一个教程中，我们必须遍历字典并为假设的超市提取 'key' 项。我问了一个关于迭代字典的最佳实践原则的问题，并被告知为了迭代目的对字典进行排序并不真正重要，直到你开始处理 'big' 数据集所以我根本不应该担心它。

我不确定为什么导师说这无关紧要，因为我相信速度是处理大型数据集的关键。我做了一些阅读，发现一个非常有用的 post (Python: List vs Dict for look up table) 关于这个。

据此，我可以假设根据任务，字典的排序是视情况而定的吗？或者你会说应该总是对字典进行排序以获得最佳处理速度？

为了在更多上下文中说明这一点 - 让我们使用以下示例：假设我们要在有 10,000 个条目的字典中搜索一串腰果的价格。在这种情况下，如果条目以随机方式放置在字典中 - 如果条目被排序而不是随机放置在任何地方，搜索该条目的速度会是 'faster' 吗？

非常感谢！

Answer 1

To put this in more context - let's use the following example: Say we are searching for the price of a bunch of cashews in a dictionary which has 10,000 entries. In this case, if the entries were placed in a random manner in the dictionary - would the speed in searching for that entry be 'faster' if it were sorted, rather than randomly placed anywhere?

嗯...字典已经有了排序，因为它们是哈希表。不同之处在于它们是按哈希而不是键本身排序的。这意味着一旦计算出哈希值，基本上就没有什么可以做的来进一步加快访问速度了。收益可以在哈希算法中找到，而不是在项目或结构本身中。

Answer 2

To put this in more context - let's use the following example: Say we are searching for the price of a bunch of cashews in a dictionary which has 10,000 entries. In this case, if the entries were placed in a random manner in the dictionary - would the speed in searching for that entry be 'faster' if it were sorted, rather than randomly placed anywhere?

物品的放置方式并不重要，重要的是物品的取回方式 - 因为这实际上就是您衡量物品性能的方式。

字典使用散列-table 来按键检索项目。这意味着项目的存储顺序无关紧要，因为检索 speed/method/function 不依赖于插入顺序。

换句话说，当您有一个字典 d 并且您执行了如下操作：

print(d[some_key])

检索 some_key 的值不依赖于它被插入字典的顺序。如果它是第一个、第二个或最后一个插入到字典中的项目，那么它会以相同的工作效率被检索。

字典迭代速度

Dictionary Iteration Speeds

python

iteration