Python 中嵌套在内置方法中的模块的大 O 表示法

Question

我想知道确定 Python 中内置方法背后的大 O 值的基本原理。给定以下操作：

i_arr = ['1','2','5','3','4','5','5']

s_arr = sorted(set(i_arr))

我的理由是，这是 sorted(set) = n^2 意思是两个嵌套循环。

根据建议，我在 Python 源代码中挖掘了更高级别的模块作为我原始功能的一部分：

 def sorted(lst):
        l = list(lst)
        l.sort()
        return l

def sort(self):
        sortable_files = sorted(map(os.path.split, self.files))

def set(self):
        if not self._value:
            self._value = True

            for fut in self._waiters:
                if not fut.done():
                    fut.set_result(True)

现在我真的很困惑:)

Answer 1

排序算法的复杂度上限一般为O(n*log(n))，在wikipedia上可以看出。然后取决于将数组转换为集合需要多长时间。从您发布的示例中，我们可以看到列表被迭代，并且在每个步骤中，检查值是否已经在集合中。根据 this 检查集合中元素是否存在的复杂度为常数 O(1)，因此由于遍历所有元素，整个集合的构造复杂度为 O(n)。但是假设它是通过一个一个地添加每个元素来实现的，我们必须遍历list将其转化为集合，这是O(n)，然后我们对它进行排序，这是O(n*log(n)) ，这导致总复杂度 O(n*log(n) + n) = O(n*log(n)).

更新：

映射也有线性复杂度，因为它遍历整个集合并映射每个元素，但是由于线性函数增长比n*log(n)慢，所以这里的任何线性操作都不会影响O(n*log( n)), 所以渐近复杂度即使有映射也保持不变。

Answer 2

(N logN) 是一些排序算法的下界，例如快速排序、堆排序、归并排序。它们是基于通用比较的算法。在最坏的情况下它可以是 O(N*N).

如果元素将是数字，则有 O(N) 个上限算法可用，例如基数排序、计数排序和桶排序。

如果我们正在考虑 python 库 'sorted' 实现，即 TimSort 实现。 TimSort 是一种优化的归并排序算法。它比常规的合并排序算法稳定且更快。它在最坏情况下以 O(N log N) 运行，在最好情况下以 O(N) 运行（当列表已经排序时）。

Answer 3

超越“通用排序”性能答案；正如你提到的 sorted() 我假设你是在询问 python.

中内置方法背后的大 O 值

嗯，python 正在使用 Timsort；并引用维基百科：

Timsort is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data.

In the worst case, Timsort takes O(n log n) comparisons to sort an array of n elements. In the best case, which occurs when the input is already sorted, it runs in linear time, meaning that it is an adaptive sorting algorithm.

Python 中嵌套在内置方法中的模块的大 O 表示法

Big O Notation of a module nested in a built-in method in Python

python

methods

performance

big-o

coding-style