<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/61461>61461</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            Clang does not consider commutativity when combining vector multiplications
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          aqjune
      </td>
    </tr>
</table>

<pre>
    Given this code:
```
#include <arm_neon.h>

void f (uint64x2_t *__restrict__ y, uint32x2_t x[4]) {
  for (int i = 0; i < 4; ++i) {
    for (int j = 0; j < 4; ++j) {
 y[i * 4 + j] = vmull_u32(x[i], x[j]);
    }
 }
}
```
Clang trunk generates 16 umull instructions :https://godbolt.org/z/5qeE8MEG9
This is not optimal because x[i] * x[j] is x[j] * x[i].
GCC trunk considers this and generates 10 umull instructions: https://godbolt.org/z/jaxan53Gn
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJyEk8uPuzgMx_8ac7GmAodHe-DQx7Snve29CiGUsCHpkKQz3b9-FWjnpZV-UlUc7K_9sU24c-pipKyh2EFxSHjwvZ1q_jYEI5PGtvf6pG7SoO-VQ2FbCWwL6QHSLZTp47cciSkjdGglAtvzaTwbac2qB_b6CJj_b1a12CHQOijjy_yDzh6BtufzJJ2flPDnM96B9hj9jGb_BxS7HIoD0Aah2i2ZEDs7xUTKeFQI7IApsN1s7jGPJtAOaKd-yX4Ihy_h8Fs4_BTeodipyIp5DMABisOsvo1B63NgBLSOqGpG3c_Yw4IN7Ft5qA6Pw6f1Zfyc6l5zc0E_BfMPXqSRE_fSYVZiiDVRGeenILyyxiGwbe_91cUN0RHoeLFtY7Vf2ekCdPwX6Fi8ydf1X6-nzZL977hU5dBYj_bq1cg1NlLw4CQ-G5kbfnYSgz_tpyNGrZaEp_3-ASuscaqVk1u-HG7a7_zp__AD2-Kf-Af-wU3BTiZpa9Zu2IYnss7KapOmLK8o6WuijESVUcZKUVWMCtEJwXmRbpqua2SXqJpSYinLSsqKLKPVJk-bdVEIUXZV2RUN5KkcudIrrW9jrJ0o54Ksyywvs0TzRmo3XxciI99xdgJRvD1THTUvTbg4yFOtnHdfWbzyWtbLQlsrl6k_x4TCjmPw3Kub8nd876WJrxpllLngTQpvJxyD9uqqleDzwJIw6frXxJTvQ7MSdgQ6xsqPx8t1soMUHug48zqg49zPfwEAAP__TzgwnQ">