为什么 collections.Counter 运行 比直接 运行ning 其源代码更快
Why does collections.Counter run faster than directly running its source code
我用collections.Counter
统计某个字符串的字数:
s = """Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""
lorem = s.lower().split()
注意这比我尝试过的真实字符串要小,但是 conclusion/results 是可以概括的。
%%timeit
dcomp = Counter(lorem)
# 8 µs ± 329 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
如果我使用这个(与 cpython/Lib/collections/init.py 的部分源代码相同)
%%timeit
d = dict()
get = d.get
for w in lorem:
d[w] = get(w, 0) + 1
# 15.4 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
编辑:使用函数:
def count():
d = dict()
get = d.get
for w in lorem:
d[w] = get(w, 0) + 1
return d
%%timeit
count()
# Still significantly slower. function definition not in timeit loop.
# 14 µs ± 763 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
对于更大的字符串,结果相似,后一个过程大约是第一个过程的 1.8-2 倍。
有效的源代码部分在这里:
def _count_elements(mapping, iterable):
'Tally elements from the iterable.'
mapping_get = mapping.get
for elem in iterable:
mapping[elem] = mapping_get(elem, 0) + 1
其中映射是其自身的一个实例super(Counter, self).__init__()
-> dict()
。在我将所有后者尝试放入一个函数并调用该函数后,同样的速度仍然存在。我不明白这种速度差异是从哪里来的。 python lib 是否有特殊待遇?或者我忽略的一些注意事项。
仔细查看 collections/__init__.py
的代码。它确实如您所示定义了 _count_elements
,但随后它尝试执行 from _collections import _count_elements
。这表明它是从 C 库导入的,该库更优化,因此速度更快。 Python 实现仅在未找到 C 版本时使用。
我用collections.Counter
统计某个字符串的字数:
s = """Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""
lorem = s.lower().split()
注意这比我尝试过的真实字符串要小,但是 conclusion/results 是可以概括的。
%%timeit
dcomp = Counter(lorem)
# 8 µs ± 329 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
如果我使用这个(与 cpython/Lib/collections/init.py 的部分源代码相同)
%%timeit
d = dict()
get = d.get
for w in lorem:
d[w] = get(w, 0) + 1
# 15.4 µs ± 1.61 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each)
编辑:使用函数:
def count():
d = dict()
get = d.get
for w in lorem:
d[w] = get(w, 0) + 1
return d
%%timeit
count()
# Still significantly slower. function definition not in timeit loop.
# 14 µs ± 763 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
对于更大的字符串,结果相似,后一个过程大约是第一个过程的 1.8-2 倍。
有效的源代码部分在这里:
def _count_elements(mapping, iterable):
'Tally elements from the iterable.'
mapping_get = mapping.get
for elem in iterable:
mapping[elem] = mapping_get(elem, 0) + 1
其中映射是其自身的一个实例super(Counter, self).__init__()
-> dict()
。在我将所有后者尝试放入一个函数并调用该函数后,同样的速度仍然存在。我不明白这种速度差异是从哪里来的。 python lib 是否有特殊待遇?或者我忽略的一些注意事项。
仔细查看 collections/__init__.py
的代码。它确实如您所示定义了 _count_elements
,但随后它尝试执行 from _collections import _count_elements
。这表明它是从 C 库导入的,该库更优化,因此速度更快。 Python 实现仅在未找到 C 版本时使用。