std::unordered_map 中的 buckets 接口有什么用?
What is the use for buckets interface in std::unordered_map?
我一直在看这个 video from CppCon 2014 and discovered,在 std::unordered_map
下面有一个访问存储桶的接口。现在我有几个问题:
- 这个接口的用法有什么合理的例子吗?
- 为什么委员会决定定义这个接口,为什么典型的 STL 容器接口不够用?
有许多算法要求将对象散列到一定数量的桶中,然后处理每个桶。
比如说,您想在集合中查找重复项。您对集合中的所有项目进行哈希处理,然后在每个存储桶中成对比较项目。
一个不那么简单的例子是 Apriori algorithm 用于查找频繁项集。
我想如果您处于高性能状态并且碰撞最终会杀死您,您可以从中受益匪浅。迭代存储桶并定期查看存储桶大小可以告诉您您的哈希策略是否足够好。
无序映射在性能方面很大程度上取决于它们的哈希策略。
我一直需要该接口的唯一原因是遍历地图中的所有对象,而无需锁定地图或复制地图。这可用于地图中对象的不精确过期或其他类型的定期检查。
遍历工作如下:
锁定地图。
开始按桶顺序遍历地图,对遇到的每个对象进行操作。
当您确定持有锁的时间过长时,将您上次操作的对象的密钥藏起来。
等到您希望恢复操作。
锁定地图,然后转到第 2 步,从您停止的关键处或附近(按桶顺序)开始。如果到达终点,请从头开始。
搜索介绍某项的提案通常很有启发性,因为通常会附有基本原理。在这种情况下 N1443 表示:
G. Bucket Interface
Like all standard containers, each of the hashed containers has member
function begin() and end(). The range [c.begin(), c.end()) contains
all of the elements in the container, presented as a flat range.
Elements within a bucket are adjacent, but the iterator interface
presents no information about where one bucket ends and the next
begins.
It's also useful to expose the bucket structure, for two reasons.
First, it lets users investigate how well their hash function
performs: it lets them test how evenly elements are distributed within
buckets, and to look at the elements within a bucket to see if they
have any common properties. Second, if the iterators have an
underlying segmented structure (as they do in existing singly linked
list implementations), algorithms that exploit that structure, with an
explicit nested loop, can be more efficient than algorithms that view
the elements as a flat range.
The most important part of the bucket interface is an overloading of
begin() and end(). If n is an integer, [begin(n), end(n)) is a range
of iterators pointing to the elements in the nth bucket. These member
functions return iterators, of course, but not of type X::iterator or
X::const_iterator. Instead they return iterators of type
X::local_iterator or X::const_local_iterator. A local iterator is able
to iterate within a bucket, but not necessarily between buckets; in
some implementations it's possible for X::local_iterator to be a
simpler data structure than X::iterator. X::iterator and
X::local_iterator are permitted to be the same type; implementations
that use doubly linked lists will probably take advantage of that
freedom.
This bucket interface is not provided by the SGI, Dinkumware, or
Metrowerks implementations. It is inspired partly by the Metrowerks
collision-detection interface, and partly by earlier work (see
[Austern 1998]) on algorithms for segmented containers.
我一直在看这个 video from CppCon 2014 and discovered,在 std::unordered_map
下面有一个访问存储桶的接口。现在我有几个问题:
- 这个接口的用法有什么合理的例子吗?
- 为什么委员会决定定义这个接口,为什么典型的 STL 容器接口不够用?
有许多算法要求将对象散列到一定数量的桶中,然后处理每个桶。
比如说,您想在集合中查找重复项。您对集合中的所有项目进行哈希处理,然后在每个存储桶中成对比较项目。
一个不那么简单的例子是 Apriori algorithm 用于查找频繁项集。
我想如果您处于高性能状态并且碰撞最终会杀死您,您可以从中受益匪浅。迭代存储桶并定期查看存储桶大小可以告诉您您的哈希策略是否足够好。
无序映射在性能方面很大程度上取决于它们的哈希策略。
我一直需要该接口的唯一原因是遍历地图中的所有对象,而无需锁定地图或复制地图。这可用于地图中对象的不精确过期或其他类型的定期检查。
遍历工作如下:
锁定地图。
开始按桶顺序遍历地图,对遇到的每个对象进行操作。
当您确定持有锁的时间过长时,将您上次操作的对象的密钥藏起来。
等到您希望恢复操作。
锁定地图,然后转到第 2 步,从您停止的关键处或附近(按桶顺序)开始。如果到达终点,请从头开始。
搜索介绍某项的提案通常很有启发性,因为通常会附有基本原理。在这种情况下 N1443 表示:
G. Bucket Interface
Like all standard containers, each of the hashed containers has member function begin() and end(). The range [c.begin(), c.end()) contains all of the elements in the container, presented as a flat range. Elements within a bucket are adjacent, but the iterator interface presents no information about where one bucket ends and the next begins.
It's also useful to expose the bucket structure, for two reasons. First, it lets users investigate how well their hash function performs: it lets them test how evenly elements are distributed within buckets, and to look at the elements within a bucket to see if they have any common properties. Second, if the iterators have an underlying segmented structure (as they do in existing singly linked list implementations), algorithms that exploit that structure, with an explicit nested loop, can be more efficient than algorithms that view the elements as a flat range.
The most important part of the bucket interface is an overloading of begin() and end(). If n is an integer, [begin(n), end(n)) is a range of iterators pointing to the elements in the nth bucket. These member functions return iterators, of course, but not of type X::iterator or X::const_iterator. Instead they return iterators of type X::local_iterator or X::const_local_iterator. A local iterator is able to iterate within a bucket, but not necessarily between buckets; in some implementations it's possible for X::local_iterator to be a simpler data structure than X::iterator. X::iterator and X::local_iterator are permitted to be the same type; implementations that use doubly linked lists will probably take advantage of that freedom.
This bucket interface is not provided by the SGI, Dinkumware, or Metrowerks implementations. It is inspired partly by the Metrowerks collision-detection interface, and partly by earlier work (see [Austern 1998]) on algorithms for segmented containers.