<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/62100>62100</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [MachineCombiner] Reassociation in loop end reductions causing significant slow down. Code stalls due to out of BB dependency.
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            backend:AArch64,
            llvm:codegen,
            performance,
            llvm:optimizations
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          UsmanNadeem
      </td>
    </tr>
</table>

<pre>
    https://reviews.llvm.org/D141302 lifts same-bb restriction for reassociable ops. This makes sense and can improve the dependency chains but it causing an 8% slowdown in spec2006/soplex on AArch64 for us (LTO+fast math).

The reassociated reduction code is, in vacuum, better due to an improved dependency chain but when we model the code while looking at the dependency that is coming from outside the block (last fmadd instruction of the loop) we can see that the reassociated code gets stalled while the fmadd's result is being produced. See: https://godbolt.org/z/vMnjKhaYP

I am attaching the assembly diff below:

```
soplexgood.lto.o:     file format elf64-littleaarch64 | soplexbad.lto.o:     file format elf64-littleaarch64
...
...
 670:       fmadd   d3, d4, d7, d3 670:       fmadd   d3, d4, d7, d3
 674:       b.ne    62c 674:       b.ne    62c
 678:       fadd    d0, d1, d0                                                              678: fadd    d0, d1, d0
 67c:       cmp     x18, x1 |      67c:       fadd    d1, d3, d2
 680: fadd    d0, d2, d0                                                       | 680:       cmp     x18, x1
 684:       fadd    d0, d3, d0 |      684:       fadd    d0, d1, d0
 688:       b.eq    708 688:       b.eq    708

...
...

 e08: fmadd   d3, d4, d7, d3 e08:       fmadd   d3, d4, d7, d3
 e0c:       b.ne    dc4 e0c:       b.ne dc4
 e10:       fadd    d0, d1, d0 e10:       fadd    d0, d1, d0
 e14:       cmp x18, x1                                                          |      e14: fadd    d1, d3, d2
 e18:       fadd    d0, d2, d0 |      e18:       cmp     x18, x1
 e1c: fadd    d0, d3, d0 |      e1c:       fadd    d0, d1, d0
 e20:       b.eq    ea0 e20:       b.eq    ea0 
```
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJykVsFu4zYQ_RrqMohAUbIsH3xwEhgo2m2LNj30SJEjixuKVEUqTvbrC1KK7Th2Nu0KARmbwzdvZp41w51TO4O4JotbsrhP-OhbO6z_ch03v3KJ2CW1lS_r1vvekXxD2Jaw7YBPCvcu1fqpS-2wI2x7nxVZThlo1XgHjnd4U9cwoPODEl5ZA40dYEDunBWK1xrB9i6Fh1Y56PgjOnBoHAI3EgQ3oLp-sE8IvkWQ2KORaMQLiJYr46AePSgPgo9OmR1wAxVhC3Da7qXdG1AGXI-CUVoStnW21_gM1sBmM4i2LCKZ0QFh1S8PvxF223DnoeO-JWyVEnpP6GZaH1o8svYoYUA5TgEJKxGUI-wuuHviYhy78KFG73EAOSJ4C8dQ5Ls4Yhj7Fg3sETorUcdwI_C-VRpBW_sYA_TnifAt96AcCNsFg2awHdjROyWnnNXaiscQoA6hNR2XEpRxfpjZ2yaaaWt7wlaBQMi6Q5yQ_XnckdQOQ3E91xrlzDAYRnTCli4UfNSRV42BVj9YOQqUKfyJSPINvBXSzsraaj9r6Bth26cv5uvPLf_799Mi_AS8A-49F20ADS65c9jV-gWkahqoUdt9QD25REo6_8WPkwZ21spUe5vaQCY8TQiisUPHPaBuyuJGK-81cj4phSzvYLpb8_92dfKbpunZP1Au6SsEzJUBkHnQjiziuoxr_nnLA3JxtK9Tg2Evmbh6cLhXnfiZ3ICkETuLK4UfemYHl6EPJMSRhOj6uD9nVTB6zmIhZixxgWw2JyKs7BWxohe8sh8KKNCYca8wPTgvruY0nykcY_rA-CxLVXVaSfwn7EtaXT04_U28F-OMinSqz0dinG1mjt8XI1LxXnNSFBcOpCheL2X0u0r8hM0BrXhbqKOc_vdzqNkM_qECMbv-w2LnInhjfE1WmIkLmn4nqNnsMzli9L1ukNPrB2fv10Suc7nKVzzBdVZWWZ6tiixL2nXDapZXZSXz1XIhSl4WNV8tS1ZQKsrFQiZqzSjLaZGxbJlRukrLqqkLWckso1Iu-IoUFDuu9GHSSJRzI65LllGaaF6jdnF0Yazm4hGNJPlm7vGEhQQTxsJdkm9CA9uhOXzd4xDf3UbguantverUNx4apQuHi_tkWIfDm3rcOVJQrZw_zj-JV17HGepLbFF4Z7taGRzI4h7-OHTR0HaViS0X0JyMEu4wyIRpTDVKcOPjMANhmknhLjTf2Hfd61xhRx96-O3tyVCQJuOgz2a1nfLtWKfCdoRtY3zTdtMP9isKT9g25tQRto1p_TcAAP__q4HKQg">