关于搜索和排序算法的问题

questions about the searching and sorting algorithms

我正在研究标准库中的搜索和排序算法。我找不到关于这些问题的信息。我希望有人能帮助我。知道的也可以发链接给我。

如果数据未排序与排序后的数据相比，搜索行为是否会发生变化？
我怎么知道在向量上使用 std::sort() 比将向量复制到已经排序的集合是否更好？那只是一个例子。我希望在网上找到一些解释，说明哪些方法最适合搜索或排序，但我没有。
如何调整搜索和排序算法的行为以提高效率？

Does the searching behavior change if the data is not sorted compared to one which is sorted?

没有。这取决于您选择的算法。一般搜索 std::find 是 O(n)，二分搜索 std::lower_bound 是 O(log n)，但它仅适用于排序范围。

How can I know if it is better to use std::sort() on a vector instead of maybe to copy the vector to an already sorted set? That is just an example. I hoped to find some explanations on the web which ways are the best for searching or sorting, but I didn't.

你可以写一个基准和度量。您可以通过将 std::vector（没有重复的元素）复制到 std::set 来对它进行排序，这会在内部维护排序顺序。 std::set 通常实现为红黑树，与连续的 std::vector 相比，通常具有较高的内存碎片。所以很容易预测结果。 Alexander Stepanov 在 YouTube 上的讲座中讨论了（如果我没记错的话）这个特定的例子。

Does the searching behavior change if the data is not sorted compared to one which is sorted?

视情况而定。如果您按位置访问 vector/array 中的数据，则不会提高性能，也不需要进行排序。

搜索可以线性、二进制、键和散列函数.

对于小的（我猜是低于几十个项目的东西）和连续的容器（例如向量）线性搜索可能是最快的，只是因为缓存友好的内存布局。

二分搜索的复杂度为 O(log N)，这可能是您能得到的最好的……我在考虑 Information theory。它要求您预先对容器进行排序。这对于在同一个容器中频繁搜索很有用。

A std::set（及其堂兄弟 std::map）在内部使用树，这也使得搜索复杂度为 O(log N)。如果您通过键而不是项目的某些标准进行搜索，则很有用。缺点是构建（始终保持排序）比填充向量然后再排序要慢一些。

哈希图或哈希表使用一个函数来获取项目所在的桶。复杂度接近于 O(1)，具体取决于项目的数量和使用的功能（冲突问题）。

如您所见，选择容器类型取决于您将如何处理数据。选择符合您要求的那个。

How can I know if it is better to use std::sort() on a vector instead of maybe to copy the vector to an already sorted set?

std::sort 更改容器，因此结果显然已排序。如果您需要原始的、未排序的容器，请复制一份并对副本进行排序。对整个容器进行排序优于 "insert-item-so-container-is-always-sorted" 所有项目，特别是使用向量（许多内存重新分配）； set/map 填充过程可能不会那么慢。

How can I adapt the behavior of the searching and sorting algorithms to make it more efficient?

我不清楚你的意思。但是，"The end justifies the means"。同样，选择最适合您的数据处理的容器。

关于搜索和排序算法的问题

questions about the searching and sorting algorithms

c++

sorting

search

std

c++-standard-library