<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/60200>60200</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [AArch64] multiply-by-parts idiom could be recognized
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          rose00
      </td>
    </tr>
</table>

<pre>
    reproducer: https://gcc.godbolt.org/z/xsTP44xvn

The two functions compute the same result, a 64x64-to-128 bit unsigned multiply.  The portable version (that avoids `__int128`) does fold to good code like the non-portable one on either x86 or aarch64.

The portable version is modeled on:

https://github.com/Cyan4973/xxHash/blob/8e5fdcbe70687573265b7154515567ee7ca0645c/xxh3.h#L338

Since mul and umulh are cheap, there is a special advantage to
recognizing and simplify portable code that performs these
operations by parts using 32-to-64-bit multiplies.

For comparison, similar simplifications target rbit and ror.
https://gcc.godbolt.org/z/Ghb3rfP5W

Also popcnt is combined into the intrinsic from the portable code.

The existing implementations are in places like
InstCombine/InstCombineAndOrXor.cpp
and
AggressiveInstCombine/AggressiveInstCombine.cpp.

Handling the present use case would require recognition that a 128-bit multiply is happening, but based only on observations of 64-bit values.  Both 64-bit values, built up from 32-bit temporary values, would get replaced by a full 128-bit combo value, which would then be split up again (or not).

It seems to me that the "bit provenance" algorithm which recognizes bit-reverses might possible be adaptable to a "scaled term" provenance algorithm that would enable a pattern matcher to verify that a nest of shift/multiply/mask nodes actually did add up to 128 bits of product.

This is my first bug report.  While I do realize that "missed optimizations" are not really bugs, I did not find in the user guide a way to make a record of suggestions like this.  I hope this is actionable, perhaps after moving into whatever bucket such things should actually go into.
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJx8Vl1v2zoS_TXyy8CGTMmy8-CHtIW3ARbYAlug-1aMyJHEDcXRJSknzq-_GMpJnLa4D_mQzJk5c86ZoTFG23uiY7H7VOy-rHBOA4dj4EhluWrZXI6BpsBm1hSK6h6GlKZYVPeFOhXq1Gu96dm07NKGQ1-o00uhTs_x-7e6fj77ovxSlPfL7-8DQXpi6Gavk2UfQfM4zYkgDQQRR4JAcXapUJ8Boamfm3qdeL1VB2htgtlnpAbG2SU7ucsGQHJOHBK2juBMIVr2UKhDGjABntmaCEVT_vxpfdqqQ9GUhboDwxShY2cgMfTMBjQbAmcfFyye_fotK3v5AbJpoADPhwY4AGLQQ1Nvfu3vNyw2wsiGHBlgL6zdBPzCpE3D3G40j4U6fb6gr-_2lXD5_BXjUKhT67gt1OlAu87olvZlc9jv9pVqdu1-u6t3292u2RPtNZZNvdM5dKg2Q6Gqf1fV4bbyf63XJDwCegPzOLsBMBDogXAS-qVXEvAIcSJt0QGaM_qEPUHiJUsgzb23L9b3OU-04-Rsd3lnIdOatZgodBzGKJkjLfE8UcDFCe0FJgwpwhwlW6VE-KZei-5XuS3FD3SfOGQDYbCRvYCOdrQOwysOq6_JE4aeEgRJJjgDh80fFfijl_81tFXovu1-3Ba_d5Fh4kn7JCxpHlsr1rQ-cbaQ9SlYH62GLvCYX31g5Tfn0LONSXoX8DSST1f0oov1MDnUFLNHl6AHH9PnpW6hTjdP9978J_yPw0ZP03IUvbnC7vtAMdozfYz-43uJ_wDzK3rjBGLuJlAkn2COBBojwRPPzkCgv2YbZJKzN6SFxQAIW3W41fMixA04TeSt70W_dk7QYsyj4i4ydNxGCucrEdzB1RFndDPFDcAnTsPHl0se6xLM00J9pfLHicaJA4bLzcEFcvYGZX6NGBGhm517gyva8hKUYwarh2tkGshDSxAnZ3NB7NHm_cMBPKdC3X0g8CFBJJIhYBivgyFcFkpJpSnwmTx6TYVSgK7nYNMwXku-ThtF2YbrQLJiKMJo-yHBxDFaMVdLgAanxWmJASV71CgbKFEYJfV7oZsqGc3SF_kcjTBhShQ8jJi0bL_EsthkxK-aeopJhImD7VKhTq_ayr8YH8GzoQio04zOXcBYA2iMUJUYros9K7vcMOmXsbAxL9ALdDbEBO3ci1Ic0gbgx2AdwQMYhkDo7MuV0EKp0cbsoinZ0b4s9smUBtntKZ93F0mXbfCQcckHnfUyw1mUOVKAfrZGiHjCSxYNH-VJpAgm9z33PcXFn9frw4oxH2DgaXnKWzRfeEKq1JsoDDhFwC5RgJHPeexlczwNmERXaGf9SAnirAdJ4vsIccjavHHZc47ZrMyxMnfVHa7ouG32dbWry_puNRy7usRGtWq_Ja2oxqYkNCWW1OxLRa1a2aMqVVVulSqVKuu7zb5paVvrrquaXWlwX9QljWjdxrnzKCtxZWOc6diUqixXDlty8fV7QzjKoXXmtC6djSm-hyWbXP6GcX-f781i9-VtDazby3rZ_dZYHkHnNlt6N7xZzcEd_-G2lDrXP-sp8P9Jixcz1lioU4b7dwAAAP__jvwFkA">