SIMD 值的合理散列？

Question

我想使用 __m128i 的简单散列映射作为测试，但 C++ 抱怨散列函数不兼容：

/Applications/Xcode.app/[...]/c++/v1/__hash_table:880:5: error: static_assert failed due to requirement [...] "the specified hash does not meet the Hash requirements"

    static_assert(__check_hash_requirements<_Key, _Hash>::value,
    ^             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In file included from [...] note: in instantiation of template class [...] requested here
    std::unordered_map<__m128i, std::size_t> hmap;

现在，我可以通过使用类似于以下的代码来提供哈希函数：

    class hash128i
    {
    public:
        std::size_t operator()(const __m128i &r) const
        {
            return something;
        }
    };

用我发明的something，像OR-ing高低64位的__m128i，然后用std::hash.

但是，鉴于哈希函数的敏感性，我不知道这种方法是否合理。

__m128i（或其他 SIMD 变量）的良好 C++ 哈希函数是什么？

Answer 1

哈希函数的实际质量在某种程度上取决于您需要的属性以及数据的分布方式。

如果您不必防御试图用大量冲突值堵塞您的 table 的恶意输入，那么一个相当简单的函数就足够了。

对于短整数，Chris Wellons 已经完成了相当多的 analysis using his hash-prospector 程序。

他说的一个不错的64位函数如下，找到here:

uint64_t splittable64(uint64_t x)
{
    x ^= x >> 30;
    x *= UINT64_C(0xbf58476d1ce4e5b9);
    x ^= x >> 27;
    x *= UINT64_C(0x94d049bb133111eb);
    x ^= x >> 31;
    return x;
}

您可以散列 128 位整数的两半并通过 XOR 组合它们，如果您希望两半经常相同，则轮换其中之一。所以你的解决方案可能看起来像这样：

class hash128i
{
public:
    std::size_t operator()(const __m128i &r) const
    {
        uint64_t lower_hash = splittable64(static_cast<uint64_t>(r));
        uint64_t upper_hash = splittable64(static_cast<uint64_t>(r >> 64));
        uint64_t rotated_upper = upper_hash << 31 | upper_hash >> 33;
        return lower_hash ^ rotated_upper;
    }
};

如果您的散列 table 应能抵抗恶意输入，您可能希望使用带有随机密钥的密钥散列函数。看看 SIPHash.

SIMD 值的合理散列？

Sane hash for SIMD values?

c++

hash

unordered-map

simd