Unexpected collision with std::hash

I know hashing infinite number of string into 32b int must generate collision, but I expect from hashing function some nice distribution.

难道不相信这2条扼杀装置有同样的 has?

size_t hash0 = std::hash<std::string>()("generated_id_0");
size_t hash1 = std::hash<std::string>()("generated_id_1");
//hash0 == hash1

我知道我可以使用<代码>boost:hash<std:string>。 或者说,但我想知道,std:hash有什么错误。 我用它错了吗? 是否应该用“种子”?


您使用<代码>没有错误:hash。 问题在于专业领域:hash<std:string>。 由“2010年视觉演播室”搭配的标准图书馆执行所提供的材料,只需要一组说明特征,以确定散射值(主要是出于业绩原因)。 同时,具有14种特性的扼杀物的最后一个特性不属于这一组,因此,这两种扼杀都会产生同样的印数。

就我所知,这种行为符合标准,即demands,但只有以同样理由向散列函数发出多次呼吁,必须始终保持同样的价值。 然而, has碰撞的概率是最低的should。 《2010年联邦调查》的实施符合强制性规定,但未能说明任择条款。

For details, see the implementation in the header file xfunctional (starting at line 869 in my copy) and § of the C++ standard (latest public draft).

If you absolutely need a better hash function for strings, you should implement it yourself. It s actually not that hard.


The exact hash algorithm isn t specified by the standard, so the results will vary. The algorithm used by VC10 doesn t seem to take all of the characters into account if the string is longer than 10 characters; it advances with an increment of 1 + s.size() / 10. This is legal, albeit from a QoI point of view, rather disappointing; such hash codes are known to perform very poorly for some typical sets of data (like URLs). I d strongly suggest you replace it with either a FNV hash or one based on a Mersenne prime:

FNV hash:

struct hash
    size_t operator()( std::string const& s ) const
        size_t result = 2166136261U ;
        std::string::const_iterator end = s.end() ;
        for ( std::string::const_iterator iter = s.begin() ;
              iter != end ;
              ++ iter ) {
            result = (16777619 * result)
                    ^ static_cast< unsigned char >( *iter ) ;
        return result ;

Mersenne prime hash:

struct hash
    size_t operator()( std::string const& s ) const
        size_t result = 2166136261U ;
        std::string::const_iterator end = s.end() ;
        for ( std::string::const_iterator iter = s.begin() ;
              iter != end ;
              ++ iter ) {
            result = 127 * result
                   + static_cast< unsigned char >( *iter ) ;
        return result ;

(The FNV hash is supposedly better, but the Mersenne prime hash will be faster on a lot of machines, because multiplying by 127 is often significantly faster than multiplying by 16777619.)

你们可能具有不同的散射价值。 我有不同的信教价值(GCC 4.5):


#include <string>
#include <iostream>
#include <functional>
int main(int argc, char** argv)
size_t hash0 = std::hash<std::string>()("generated_id_0");
size_t hash1 = std::hash<std::string>()("generated_id_1");
std::cout << hash0 << (hash0 == hash1 ? " == " : " != ") << hash1 << "
return 0;


# g++ hashtest.cpp -o hashtest -std=gnu++0x
# ./hashtest
16797002355621538189 != 16797001256109909978


The function is used in the right way and this collision could be just fortuitous.

You cannot tell whether the hashing function is not evenly distributed unless you perform a massive test with random keys.

TRI1的散射功能和最新的标准界定了像护卫这样的物品的适当超载。 当我使用以下 st子来管理这一法典时:tr1:hash(g++ 4.1.2),我对这两条str有不同的散射值。

