<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/108418>108418</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Comparing `__uint128_t`s generates significantly better code than comparing pairs of `uint64_t`s
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          pkasting
      </td>
    </tr>
</table>

<pre>
    See https://godbolt.org/z/1ajsxs8n5 for examples.

On x86-64 -O2, any implementation of comparing pairs of `uint64_t`s seems to generate noticeably worse code (along all axes) compared to using single `__uint128_t`s. Seems like the former could be optimized into a form close to the latter in at least many cases.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyMkk1u3DAMhU8jb4gxbPkn1kKLNMFss8gBAtrmeJTIkiHS6UxOX2iSpkC76YYCBJJPT99DZrcEIqu6H6p7LHCXc0x2e0MWF5ZijPPVPhPBWWRj1dwrfVT6uMR5jF7KmBaljx9KH2t85QsPoYNTTEAXXDdPXKrqUVX3n_UpwGXoD30Lhyet9ANguILLfSsFQXExQDzBFNcNkwsLbOgS5yvVV7sL0rcvovqKgYlWBomwUKCEQhCiuIlw9Ff4GRMTTHEmUHpAH8MC6D3ghVhp87Wf5jy_c9bJxVNWeXnJOrUePoVKeL4pefdGIGfK3lZKMMXdzzASxE3c6j5oBhckAt4aYPKRKa_PIx5FKIELgAKekAXWbHxCJi6L2TazaQwWZOs73dfdcGd0cbZ9bzTVd017mruq6oyhHnEcjDYNdafJFM7qSreVqXXdN02ty1F3Wpuh6bqu6bEzqq1oRedL79_XTKpwzDvZuhraeig8juT5N_dkc9dh3BdWbeUdC_-ZEyee7MM3l38_6hsEQ86TO7kJg_grjHRzf6MhZwz_QbfYk7d_xc3JeR_LKa5KH_Ozvo7DluIrTaL08eaNcxA_7b1b_SsAAP__3brxog">