<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/62792>62792</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            `hash_value(TypeID id)` produces collisions 
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            mlir:core
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
            makslevental
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          makslevental
      </td>
    </tr>
</table>

<pre>
    Drop this somewhere (like [here](https://github.com/llvm/llvm-project/blob/9602c7a0817f34fb4d01194119fe4b75c00ea5ed/mlir/unittests/IR/TypeTest.cpp#L66))

```cpp
TEST(Type, TypeIDComparison) {
  MLIRContext ctx;
 ctx.loadDialect<FakeDialect>();

#define FORALL_CONCRETETYPES(_) \
  _(BFloat16Type) \
  _(ComplexType) \
 _(Float8E4M3B11FNUZType) \
  _(Float8E4M3FNType) \
  _(Float8E4M3FNUZType) \
  _(Float8E5M2Type) \
  _(Float8E5M2FNUZType) \
  _(Float16Type) \
  _(Float32Type) \
  _(Float64Type) \
  _(FunctionType) \
  _(IndexType) \
 _(IntegerType) \
  _(MemRefType) \
  _(NoneType) \
  _(OpaqueType) \
  _(RankedTensorType) \
  _(TupleType) \
  _(UnrankedMemRefType) \
  _(UnrankedTensorType) \
 _(VectorType)

#define DECLARETYPEID(TYPE) TypeID TYPE##ID = TypeID::get<TYPE>();
  FORALL_CONCRETETYPES(DECLARETYPEID)
#undef DECLARETYPEID

  typedef struct {
    TypeID typeID;
    char typeName[100];
  } typeIDTuple;

  typeIDTuple typeIDTuples[] = {
#define DECLARETYPEID(TYPE) {TYPE##ID, #TYPE},
 FORALL_CONCRETETYPES(DECLARETYPEID)
#undef DECLARETYPEID
  };

 bool equalError = false;
  bool hashError = false;
  for (int i = 0; i < sizeof(typeIDTuples) / sizeof(typeIDTuples[0]); ++i) {
    for (int j = i + 1; j < sizeof(typeIDTuples) / sizeof(typeIDTuples[0]);
         ++j) {
      typeIDTuple type1 = typeIDTuples[i];
      typeIDTuple type2 = typeIDTuples[j];
      if (type1.typeID == type2.typeID) {
 fprintf(stderr, "ERROR: Expected type id %s to be unequal type id %s\n",
                type1.typeName, type2.typeName);
 equalError = true;
      }
      if (hash_value(type1.typeID) == hash_value(type2.typeID)) {
        fprintf(stderr,
 "ERROR: Expected hash of type id %s to be unequal hash type id "
 "%s\n",
                type1.typeName, type2.typeName);
 hashError = true;
      }
    }
  }
 EXPECT_FALSE(equalError);
 EXPECT_FALSE(hashError);
}

```

and you will see something like 

```
ERROR: Expected hash of type id BFloat16Type to be unequal hash type id ComplexType 
ERROR: Expected hash of type id Float8E4M3B11FNUZType to be unequal hash type id Float8E4M3FNType 
ERROR: Expected hash of type id Float8E4M3FNUZType to be unequal hash type id Float8E5M2Type 
ERROR: Expected hash of type id Float8E5M2FNUZType to be unequal hash type id Float16Type 
ERROR: Expected hash of type id Float32Type to equal hash be unequal id Float64Type 
```

The reason is [`hash_value(TypeID id)`](https://github.com/llvm/llvm-project/blob/b7f93c28096fc8503e4d2d80c43ee2c0ccce480f/mlir/include/mlir/Support/TypeID.h#L150) calls [`DenseMapInfo<const TypeID::Storage *>::getHashValue(id.storage)`](https://github.com/llvm/llvm-project/blob/77f38e1325a251623ec39a0d9ffa718e80436f94/llvm/include/llvm/ADT/DenseMapInfo.h#L83) which is the original implementation from ~[2007](https://discord.com/channels/@me/1101296426488700948/1108607430717345812). Well the problem isn't that it's the original implementation but maybe that it makes certain assumptions ([possibly something about malloc returning 16 byte aligned address groups](https://stackoverflow.com/questions/70692795/why-is-malloc-16-byte-aligned) - shoutout @rkayaith for connecting this to the `>> 4`).

Anyway besides the problem of collisions being an issue for people who want to feed some other map with these keys (e.g., Python maps), it's possible that such collisions lead to poor performance (since poorly distributed keys means `DenseMap` is spending a lot of time looking for empty slots). I played around with some alternatives (e.g., murmur hash from [here](http://zimbry.blogspot.com/2011/09/better-bit-mixing-improving-on.html)) as well as doing what [`DenseMapInfo<...>`](https://github.com/llvm/llvm-project/blob/2bb43d72d91cdc269fd756725a5f187dcb48b038/llvm/include/llvm/ADT/DenseMapInfo.h#L155) for other integer types i.e., 

swapping

```cpp
DenseMapInfo<const TypeID::Storage *>::getHashValue(id.storage);
```

for

```cpp
DenseMapInfo<uintptr_t>::getHashValue((uintptr_t)id.storage);
```

They all resolve the collision issue but I'm not sure what to propose since 

1. hashing isn't anywhere near my forte 
2. this potentially affects lots of downstream users.

@rkayaith proposed benchmarking a couple of choices against IREE tests and I'll come back tomorrow with those numbers.


</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysWElv47jy_zTMhYghUfshh3jDP0Avg3Rm_u-9S4OSShY7FKkmqbg9h_fZH4qyY9mJk-mZDoJEYhVr-dWiIrm1YqMAbkgyJ4x1_NFKeALluCSMkWR5xQfXanMzpVyVut7dLI3uqWuFpVZ3sG3BACUsl-IRKEnm-E6SJWF561xvSXRL2Jqw9Ua4dihnle4IW0v5dPh33Rv9DSpH2LqUuiRsXaQBqzIe5GHWRHFTxnUQhkUchkUDcZklVRAAT6AmbN1JYQhbD0o4B9ZZwtZ394StH3Y9PIB1s6rvCYs-pClhBf4GSxLc7v-mwfiLPH7lYfXlgbAcNxO2oPj_brnQXc-NsFoRVlCSzUdeSj9-uLtfaOXgh6OV-0GiA6VyP2ZS83opuETHosWaP8Lz24qwHE058O__sqiGRiig68_3tx8-fF18_rS4Xz2sHv792-oLYflXrz5ZHNR_JSyfr6XmLkxHi1-Q0XQJP15Skei35qv4YzQPw_Wn3_9zQciRb_3pL7C8Jyf5yN5leE_KRYc9NXpTQRpfog6qckKrC-Q7VV9C8k452IC5sPEjdPfQXCB-0goukD73_PtwiXjP1SPUD6CsvqT3Yejlpe2_K-MFvGnbgemSFuT5Ayr3THs9nZerxYfbe5_Gd9gV8AEFjdVFx9eIsOhuSUm03K9j24huN4DV41lelA29VChnCotngwZVQ3Nmz8RkSt2uB2SxzgyVm9Y6PZjr9tZNKFXLjV__xDsgyTwMAux_RxaSHfb5mJwVPp3Sps8WW3MygvJsyvu4kmw-xRQbGWGRX8rwba_1l4Hn_Tv3qdRaUvg-cLkyRhvvQ8OlhQksnqfltr3M0uA6y4VyVHiGgERz_7igVvwJuiEsP4EMAWDrC8Rk7iPjk4gSNidsLs6a-onOb16nQFYa4p5vv0LzUZX_Ge349sKOl3kRenPO5IrTXHttH3tt37eX-0RD93aHs5EbNx72sv3aqaFNb4Ry6K91NRgzphtb3d9_vifRLV396KFyUHsRVNSUsMRSp2kJdFA-Q05IJFkoHECe8_Ts52idrza2mNg2rkwgPktAZwY4cxlT9wUCmJNfn7gc4AwO7_qIyAueCTyvhJK-BtSe4VW4UD7VzVuweZYjnR3F_XIgT6v0PRwnL8fH1b9-Wy0evq5vP3zBTn4MzYmiM65nvSfzUrZ8dYybLnJV050e6FZISS2AH1VdK9SGjnPqZQHvx2I6db0Vk8n4Rf-q7FensreUnI9nf0PTT6jZT28_q2Qy072rZI_rT2kYRz4UPZE60XLgG4c_-kbiPLRADXCrFRUWjzMkDU5qfT8HiBpTMg3-2UmnzJoiqlgeFGlT5UkQQVyzOg-qOAJgVVBVFcR50BxPOkJVcqjhuPBl6Htt3P7Mc7ectXjeCZMAu1DFpTx4sQRl4SPv71SjSbSotLLuZNj64rThGzzL3eK0dRjA_o_b9o-986Ke2ZHrV3ifZU2UQxixhLMkTFkEVVTwoC6ahmdhDnkQR2lTxEc5R-_3C7fLB8LWU9dG__MI3d-2omoxjq4Fqo3YCIXJgFXZ4YEWx33aGN3R_5JkzoIge82hWthKm3rvUdVypUDiSZPEAfbJdRgGISvSmKVxnmdBUMT5uJqnQRZHQRZmUZzkISOsmNH_Bym9Qb3RpYSOCqsIyxx1LXdUOMKytw0uB0c7vivhsIN2_BEsrcA4LhTl1g5dj6wWv2YkmffaWlHK3aQL8lJ7MVLqihpwg1G4HKa03DmgXIqNgpryujZgLd0YPfT2NXSs49WjfgLTSL3dY_R9AOv1Y5CDtGBZkRC23ra7a2GvR6XXYXqNqq73qjBe19S2enBoGYkD88h3XLjWT2SVVgoqhzb6qwenPUaYg5irKxrjIytm02K-Vbst39ESrKjBnoCuG1ppKYX1MJXgMcGatwN4hT1onJ-2raZbrhwqbABqDyHVrgVDO97TLRroWrBAH2HnAYfZZobf0992rtUKuawfChaH4O7DsY-fHap2aosEXqO2XnsrTKNNx1Xl71iswAekyB2thXVGlAN2Ra-7A44hP5Y6SQPMftuDqr2DVGrnm6fogEqtH3EVvYWudztqpXbWJ-kd7SXfYQIYPah6dNO7zqUDo7gTT3DibTeYbjBj5_Ul9eIq6Dll_hRdaXazUuqN7bXbJw0LwpCwdVBggwDnwFyXwl134odQm2vR9UY_4ZNWs9Z1cj9ncUu3WFDc0lqjN1vE9NWON5vNMFf-addiZRlHdcbqIqzqiqVFU2dJmrGEJ02YZ3VVxnkZRPnf6FphkqBPGJAxw8R4s-A_dpaKGXiopylut7zvhdq8fbH163v_8zD22ke00ean7BmEcr0zX90l3YTlRx5W_IQpDy3sKJeSGrBaPoFvAs_Vtq937Kh3hGUdVRoL0sCYRliFRvfaAh0rbyo5nPlkx5w7dHCuduN1qAJuaLfDSLrDLjYbO1evHSgnuJQ7ypsGKmexLC3WZa23yjoDvKODBWNPutm0Je7NqmkJqmo7bh7H-q60P_Vhd2u1qMBSvuECo313v1pRf0dKcThGd6WkFZZ0yatH6nSnjdHbQ0dDp9XQledWXNU3UV1EBb-CmzDN4yhPC5ZetTdxWUPJApZWddyUQRIXQZwkRVM1vORh1lyJGxawKEjCPEjChKUzVnDeBFWWFHWVQhiQOICOCznDMplps7ny4blJWVawK8lLkPZwX43DT3RbaQP762pz48u1HDaWxIEU1tmjHCechJt3RjkEtR4Qs0k3vhqMvPnpVuHtxq-fN_1_AQAA__-HEQ8I">