<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/62736>62736</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [reassoc] "Reassociate experssions" pass optimizations not always profitable
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          pawosm-arm
      </td>
    </tr>
</table>

<pre>
    Consider the following piece of code:

```
void innermost_loop(int i, double d1, double d2, double delta, int n, double cells[n])
{
  int j;
  const double d1d = d1 * delta;
  const double d2d = d2 * delta;

  for (j = 0; j <= i; j++)
    cells[j] = d1d * cells[j + 1] + d2d * cells[j];
}
```

When compiling at `-Ofast` level, after the "Reassociate expressions" pass, this code is transformed into an equivalent of:

```
  int j;

  for (j = 0; j <= i; j++)
    cells[j] = (d1 * cells[j + 1] + d2 * cells[j]) * delta;
```

Effectively, the computation of those loop invariants isn't done before the loop anymore, we have one extra multiplication on each loop iteration instead. The increased performance cost measured on AArch64 is significant (a test code built with the compiler having the "Reassociate expressions" pass manually disabled executes 20sec., with this pass enabled it executes 26sec.); the problem is also worsen by lost optimization opportunities at codegen when targeting SVE.

The transformation happens in the `ReassociatePass::OptimizeAdd()` function and the only logic that could prevent it, is this:

```
  // If any factor occurred more than one time, we can pull it out.
  if (MaxOcc > 1) {
    LLVM_DEBUG(dbgs() << "\nFACTORING [" << MaxOcc << "]: " << *MaxOccVal
                      << '\n');
```

In this case, `MaxOcc` is `2` and `MaxOccVal` is `delta`.

I suspect this logic is too simplistic. I guess there should be some cost model involved where expansions of loop invariant computation expressions could have the cost high enough to prevent them from being transformed by this optimization. Alternatively, the `LoopInfo` analysis could be used to check whether the transformed expression is inside a loop and is invariant in that loop. If yes, such expression would not be modified.

Note that GCC12 does not do such transformation at `-Ofast` hence not causing a performance drop. The performance delta was the primary reason for finding this problem. I've also found a 2013 paper [1] which states the following about this optimization profitability:

```
Figure 3 shows the performance distributions
(density) generated by activating and deactivating the ’licm’
and ’reassociate’ compiler transformations for a GSM codec
application. It can be observed that while the activation of
’licm’ has a clear positive effect on performance – the median
is shifted towards higher performance values. This is not the
case for the ’reassociate’ transformation, since the activation
and deactivation distributions have almost the same shape and
density, thus not permitting the designer to recognize a clear
trend.
```
[1] http://amirashouri.ca/resources/vlsi-soc_Ashouri2013.pdf

</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJysV01z2zgS_TXQpcsqCrQo6aCD_JVyVTLZmpnNHlMg0BSRBQEuAErR_PqtBihKcpzsHsblsgWi0eiP915TIgS9t4hbtnxgy6eZGGLr_LYXRxe6O-G7We3UafvobNAKPcQWoXHGuKO2e-g1SgTXgHQKWbljxRMrzn-rYvxNy4PTCrS16DsX4lfjXM_4WtsImvFHUG6oDYJaXC_49QJNFLSmI_ZqQ6IxgS0fLFs-Mb4ZL1895A-Q7L-xclpLZ0O83KeAlU-gFsD4brzkZ7Z8tOXv2J5PNM4D4-tvybJg5QPQx0da6bRi_CH9bs4nYMrgG1s-jdGodMW0AYw_wCJt84ccyPU2ZT4Fsnp6t_75779atCBd12tD_RMRWFXcfW5EiKwqwOABDdVWNHFsNuP8dxQhOKlFRMDvvccQtLOBcQ69CIHsY6tDAgHoANELGxrnO6SORwfCAv5n0Adh0EZwza-R8rZlf2tpGV-Pvf5pcX-sLd-81_H36vvcNCijPqA55bJgqvYQRdTOElNi6wICwR-0PQivhY0BdLCMrwhqFqHGxnlMh5OdsKfOeSSHR4RWHBDIDL9HL6AbTNS90XK8wQIK2Y4XRPT5sbYholBz-LNF0FZ6FAEV9OipT8JKijNE6FCEwaMiP7udl211Tx0ljdCNlsJGqqCAiCHmfteDNhGOOrZTttqgpzAJYf8fhKATdhDGnEDpIGqDCvA7yiFiAF4ElPOUfL5Eh3wGbbbU8cq4ysYbAgTd3XtXG-woCWGCg6PzAS3UJzCUr-uj7vRfY-363vk4WB01BuIGJbhHC0diTRR-j5Fy-uPL8_y66VTTCfPZVSv6Hm0AbXMFquKqAv8g0pQ7Vu4-5-txpxTjawq7KqAZrExOhFXptLOGwt1rCbFNYQ1GQe_xQHTSMaliSJX5X8xi_IXxF3htCFTQCBmdByfl4KnpXYadsAlfUXdnzElhoR-MoVq7Ic4nojYEh0_i-2cpgZXPsEhcuYgvwMePXz59fXp--OcHol69DznRzNxHwgZbPtqX3eOfn39__e0D0Bzi_Lw9ub4YP7FyB1cmjO-y1RdhLtf--DOZr-hC-pdQ8gsyv9pR10RIhWBVkW-iLulAa04fqU_THkUxbWfFqIobuLxCGEKPMmbvubHUP-cg6K43OkQt5_AK-wED9RU9QmhT12uE4LozW51CQzLizAEVwdQnegmb2EVycys0N2J0xcMRUklaMotDhFbvW0Drhn0L0U14iy120HjXQY2J4VdqX59yTte0msPORPRW3Moiq4qPzvWvtnG5hsKcgj5HUiMMJFDRgWxR_ptyozqko9c3XpKgEur0lgLirJsqPzxnn9goYtqdEwlOmKZXGGR77emYYrAuUhydU7rRqG56-JuLmH19eHxccFAOQzqgXPb2Rg_eDtoWSXLpgBRDSKP4RoyVpwhJWG6eEpzgKMIobboT_gSk5c6m8dhoq7Lqkkhm6ZvDK-OrA2b9a9xgFQjgxaKEXvToiXFp-h1bLVsIUZCQ3r7nidoN8cfW0hWNjqLWRsfTr7XnRe8Hj1ASkI9jAtep6RC9roeYxkJ2wNcKbSDPfAN7tDTMMsgEzViRxJiarPDqQQLXM2frgm02RstuWmS3ia3nR_6iytOzywy7bWJIJRbw4Y9PaTLI0V8_Td85vMaklTWCqwN6YmVCybHVJlPrHGl6GxgTfTdaaEUAAdKg8NC7oIk_gOkFg8bzdfXOh8p0RYdKC5t90_BudRMTl47Cq5B4jf7m_EGYAQMBjviSkRxbzC5I_lLqN5V9t3K3BUvU0uT_NvFLHy6Nc_YWAlmLhKEvC-l4EB2JoOiRWp5dTPAgSRly2D36TscJCQrT9xtPSuJRur3Vf-G5rNlL9Ggndr-ZAyM12hh7gneanqLTXpAcez2XgvEXj8ENXpKWvBxM0HfBya-7bEE8m_dq7PRMbUu1KTdihttFtb7ny8V9tZq126KUdY1rrNYclyXyoiqWWHJZNeV9jbKZ6S0veFksF9ViWa7uV_OyWoqmKFb3qqorWXB2X2AntJkbc-jmzu9nOoQBtxVfldXMiBpN2ObhavEIaTPP05nf0pm7etgHdl_Q-AkXL1FHk74djg3P78lvX-jQv3mhu5aJ3BhhjuIUJskwOBu82VJlw1TavY7tUM-l6xh_oRDGf3e9d99QRsZfUuBU6ZTYfwMAAP__LRG5Lg">