<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/56606>56606</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Bug in __hash_len_0_to_16, introduced when migrating from City64 to std::hash
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          renzibei
      </td>
    </tr>
</table>

<pre>
    On 64bit platform, libc++ use City64 hash as the hash function for bytes or string. There is a bug introduced  in ` __hash_len_0_to_16` when migrating from City64.

The original code in City64 is [here](https://github.com/google/cityhash/blob/00b9287e8c1255b5922ef90e304d5287361b2c2a/src/city.cc#L128). And llvm libc++ code is [here](https://github.com/llvm/llvm-project/blob/c444f037878c61bcc750855c6edbcdac8f0871a5/libcxx/include/__functional/hash.h#L135).  By comparison, it can be seen that the original code in City64 `uint64 a = Fetch32(s);` was changed to `const uint32_t __a = __loadword<uint32_t>(__s);` by mistaken. This change drops 3 bits of information in the following use of `a` or `__a` 
```
return __hash_len_16(__len + (__a << 3), __b);
```

This mistake caused a problem: The 3 bits from positions 29 to 31 of any string of length 4 to 8 bytes are ignored. The following minimal code shows this problem. In this code, the 8 strings of length 8 differ by only 3 bits, but they all have the same hash value.

```
#include <iostream>
#include <unordered_map>
#include <functional>
#include <string>
#include <sstream>
#include <iomanip>
#include <vector>

std::string ToHex(const std::string& s)
{
    std::stringstream ret;

    for (char ch : s) {
        ret << std::hex << std::setfill('0') << std::setw(2) << uint32_t(uint8_t(ch)) << " ";
    }
    return ret.str();
}

int main() {
    constexpr uint32_t offset = 29;
    uint64_t mask = ~(7ULL << offset);
    uint64_t underlying_bits = UINT64_MAX & mask;
    auto hasher = std::hash<std::string>{};
    std::unordered_map<size_t, std::vector<std::string>> hash_value_count_map;
    for (uint64_t change_part = 0; change_part < 8; ++change_part) {
        uint64_t new_key = underlying_bits | (change_part << offset);
        std::string str_key = std::string(8, 0);
        memcpy(str_key.data(), &new_key, 8);
        size_t hash_value = hasher(str_key);
        hash_value_count_map[hash_value].push_back(str_key);
    }
    for (const auto& [hash_value, str_vec]: hash_value_count_map) {
        std::cout << str_vec.size() << " keys have the same hash value " <<
            hash_value << ":\n";
        for (const std::string& str_key: str_vec) {
            std::cout << ToHex(str_key) << "\n";
        }
    }
    return 0;
}

```

Compiled with `-std=c++17` flag, and run on little endian cpu, the output would be
```
8 keys have the same hash value 13475952360111484157:
ff ff ff 1f ff ff ff ff
ff ff ff 3f ff ff ff ff
ff ff ff 5f ff ff ff ff
ff ff ff 7f ff ff ff ff
ff ff ff 9f ff ff ff ff
ff ff ff bf ff ff ff ff
ff ff ff df ff ff ff ff
ff ff ff ff ff ff ff ff
```
I test this code with compiler `Homebrew clang version 14.0.6` on macOS 12.4 arm64.

The fix can be assigned to me if needed. 

In addition, the City64 part was introduced before Oct. 2012. There are some new commits between 2012 and 2013 in City64 which change a lot of the codes and enhace the hash quality. e.g. [0469694b4609253742bcd3008c0b3f3ba9fcd9ba](https://github.com/google/cityhash/commit/0469694b4609253742bcd3008c0b3f3ba9fcd9ba). I think we can migrate these new improvements in. Of course, CityHash64 has stopped updating since 2013 though.


</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJydV0tzozgQ_jXkohqKh3n44EPizNSkanZz2JmqvVFCEkYbkLxIxPH8-u0WYGMHZ2fXhW2BpNbX3V833aXmx82zIumqlJbsG2or3bVetCWNLJkXPcBFeiPIVtpjuiI1NTWhhthaDOOqV8xKrQjsI-XRCkNgYGwn1c4n32vRCSINoaTsd0Qq22neM8EJjImXBqQoUEzRCFUEhdVFmOLTQy0UaeWuoxbkkKrT7YjA94JHL7gffkE8nCZ3UtGGMM0FSh2RwqFe8oDne8mjF-W1tXvjxfde9AWunbR1X_pMt3ij9a4RMGCwFeHAsGx0CX9BUK6jPBM5C6MkKZN1FIlqHYg4WPEEJuI0LCMWUVhqOjaK8BmM4m9hlHvR2if3ipOmeW3nJh3A_jpE3D_-fdp3-i_B7BkkW61WVRBneZYzwMNYlgR5krBU8JJxyvIqyLOQJigAILy9wUAq1vQclS6KyYe0gVvU368d_jhx-MnDEfC2e9pJoxVyA6jCqCKlIEaAo2xNrWPELV-AS3vwPYwo8eJH8kVYVscRqGzgBC9-cD4HWrGaqh2ww2rcw7QyluDOOCosUGXYXRSNpvygO-7F22nWiz-DuKKYCSyPQCFj6YtQyEQ5iSe803tDYgKUB7ZWABRJTx2NpXKaVLpp9AG5h-SHNSCQolAgN_wBFLwZuZgG4-VuO2H7Ts2JDZxGaDAk6Ht3g6ps4SIxAgabFkU5Ql8SOvEdlBh1Ag8ANA4WBT6UjWiBOBhvk14uZvbaSFTLkGiNRo1D1IWq4xiheAewdrYmK5zPxxCmGLU7pTvBXRDP7NFKJdvJxabWB0wGAGtE4ZMnNTzABagYmjMfzzOzA3PCZVUJzBpEq-Y4AsctZe_4dCS0aSDNvAonxNB2TDqvtOnFRSa4NlcUjwRHM0sNpwvaIkcWpnvQk0MQ8qKl-xtrZiGyvGDQ79bkh-dL3VIlb538CqGuu_Ok-zWWY56I70c3ftdfBUR1PkTM1awXpcSFxSAgGxlG4HO1coBJgMFnHp7WYoLHI2raQSARpBtKJRcC8QPbJ3Kf5Nfi7d0zI2wlG8g5kCazAL5O2vtFB1gRzeZOIR_lOMzdiNUujk6LvCjC70kPBOZlj-ebMUzhzwe1HYhZ9E0rh184hbRUqmHVpcbO5OJt350Tla4q42zwSGYycfGQBQuUZl7cAi_DvJX9-PZtQj7s9m7t7BVwtTmCtwoX5yjkx9Pv32Hyt_s_CTobhV9spj3ENkYORBuuP7sFX3ZI3kvCANlARbBBvMSVq3jZGvlToA9mXptIuyQ5_uygFC6IC6Z7ZQdBD--4dlJ6SNwFvIIGswaw-urhluT4cHi_zqYWKXoSrMSheIFEg0LfWTbbjoSfH3PbRQsBhWnvJP9dWOY5Gi1YktOKlu2P-IIcBPicWjqSNEJY6Ygc7_JFJM4rM1M7DAMJznKXdi56B-qU02OoVvx9D3clZS-3ZV2E25Q9XIJCPiJRL6U6AnUFcAfLIUgvi0CW3HmyLCyb5R4ny0dDTIF7zg2A19x8tQzZwy2-POjSPDN5eHyyVdcZ553mS6l5tB7m01H9JR1v6jnl_rMX5rhugLrwzVJeDG7lwsXCZAvloWygGjlA0YoF0icH9XEsd8MMq6WqoTt0MoV6uOsVvPShIra2EUQoLqGeZPt-KhhAvz2oeNB9w6HOXDw6_xcnhvEqS9ZJFKdBGIarfBUmGZrPba4qMlzhNHDX1WT80WTy0WT20eT6o8nyo0n-0WS1MHllsycC5Z09F2iDw9jgPlfaftWtKDtxIKyBvEdeRWewLg5XfuC73gxuWsqe_yBh5ENB37ULTVkl36YOgRoDleRQ04N_ZAU5V3CsK-eboGiknLtadaLA2Du4vIu9wax5LAUElSDPzPokCgDH2Ghi2WoAP6Z1VKrFNF5CBYFdCi503INBPGtODrWEcmZsDShpNL7AHQI0kHFbhKopE-eu9--eNtjoEeFDlwtpLFil63S9KldpsI6SOFtF0HnFQZCzoIyruKTrivF1Sf9XLzpogt3or56CbdsTelm9kINwrhi6aaeDGQwkW6jZX0UrlEXr-uS5ApX7zrhUjNb5CucPTT-kHr3fg-n7PR96cgM1qhhsaWvd7-oLEtzxTczX8ZreWQkRvnlw3f9St4_t5Nmztxt_5M9F3XLXd83mP7fM0pheQN36JUnTIL2rN4yFAS0hNWCXzFJaVlnFwyRPuIC-niZ3DS1FYzbgZHDendxEQRQFWbgOsigK134S0DzlvIyTioYJpNpVIKBWbHw82Nfd7q7bOAxlvzMw2UD7Zs6TY3iIST68GmvdbTqhfspSyDuHd-PA_gPOxk7d">