<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/121344>121344</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            LLVM-MCA thinks that GCC's vectorization of loop reduction for Zen3 is ~50% faster
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          TiborGY
      </td>
    </tr>
</table>

<pre>
    I am trying to optimize a few hotspots in a larger body of code, one of which is an RMSD computation implemented as loop reduction, where the loop bounds are completely known at compile time:
```
double RMSD(const double* const dm1, const double* const dm2){
 const size_t distsPerGeom = (12*11)/2;
    double RMSD = 0.0;
    for (size_t i=0; i<distsPerGeom; i++){
        RMSD += std::pow(dm1[i] - dm2[i], 2);
    }
    return std::sqrt(RMSD / distsPerGeom);
}
```
What I have found, is that according to LLVM-MCA, GCC is far better at vectorizing this function, 2823 total cycles vs. 4455, see this link:
https://godbolt.org/z/ov45ebsoM

I have not tested this on real HW yet, but I cannot see how this is not indicative of some issue in an LLVM component. Either MCA's cycle counts are wrong for Zen3, or clang is not as good at vectorizing as GCC is.
</pre>
<img width="1" height="1" alt="" src="http://email.email.llvm.org/o/eJx0VE1v4zYQ_TX0ZRBDoqRYOejgjzpdYAMUbdFFeykoamSxS3FccmTDOfS3F6TkJA1QQ4DJ-X6Pj1QhmJNDbES1E9VhpSYeyDe_mpb88--rlrpb8wXUCOxvxp2ACejMZjSvCAp6vMJAHM7EAYwDBVb5E3qIeUA9aOpQyD2Qw7i9DkYPYAIoBz-__HIATeN5YsWGHJjxbHFEx9iBCmCJzuCxm3T0xiLXAT0CDzj7WppcF0B5TGUsMtobfHd0daA42YxFYDOiKLYi24rHbPmybUdTazENIWStyQWG2SbkFpb9mMe2_-eUQj6JzU5kd1Mwr_gnQ2cCh5_QPyONIIoDCFnnUshtnscMeZSiSFkA8GGMFJqts3dnTz7mLmWNKA7RGRf7jz1mm9yl7z7R8psLy12sHbiLPBTbM12FrCO6amdEdYCHhGbeRMQJ2dsYYnNYVh558u69UPjbs5D10uT4H-RvJeb0j9x_GxTDFxjUBaGPhxh7mgAc7Upr8t0ita9ff3t5eNlvY8DzPgX1ykOLzOjjIV9QM3nzmuKH6J7cm15kLQtgYmVB37TFAJewhrKsqugNiHOKNe77LJCB-RziUh6FPJ6oa8nymvxJyOOrkEe6lBW2gV4ioGy7QHDEwBiibFM9cuBRWfjxG9yQY6t2ini1cjE09h3oOseakNKN64xWbC7plgQaEUwIE6Y75RINSc_k0PEafjA8oIdEzCbM4EDT5Hi-DldP7pTk8we6It0_D9oqd7o3VAFORN1nClVYaF6vuqbonoontcIm3xRVtcmrrFwNTZ21xRPWtZTlJldK6S7ve5VVfSGVrmtcmUZmssxlkeVVnpflun581O0jbh43mZRFvRFlhqMydm3tZYzsrhLWJpd5UZYrq1q0Ib1HUjq8zkwIKePz5JuY9NBOpyDKzEa9vZdhwxabu2Yiwe77oqrn_T5RdQc7vzjUf3pl3jiLPP1TZUJW0KvA6FeTt80nfRgepnato9aPcYjl7-Hs6S_ULOQxjR6EPC7YLo38NwAA__-2drks">