<table border="1" cellspacing="0" cellpadding="8">
    <tr>
        <th>Issue</th>
        <td>
            <a href=https://github.com/llvm/llvm-project/issues/65426>65426</a>
        </td>
    </tr>

    <tr>
        <th>Summary</th>
        <td>
            [flang][hlfir] Polyhedron/nf 23% performance regression
        </td>
    </tr>

    <tr>
      <th>Labels</th>
      <td>
            new issue,
            flang
      </td>
    </tr>

    <tr>
      <th>Assignees</th>
      <td>
      </td>
    </tr>

    <tr>
      <th>Reporter</th>
      <td>
          vzakhari
      </td>
    </tr>
</table>

<pre>
    Code with HLFIR lowering runs for 7.4 seconds vs 6 seconds with FIR lowering.

There is some overhead due to extra temporaries at line 261:
```
256     subroutine NF2DPrecon(x,i1,i2) ! 2D NF Preconditioning matrix                           
257     integer :: i1 , i2                              
258     real(dpkind),dimension(i2)::x,t                         
259     integer :: i                            
260     do i = i1 , i2 , nx                             
261        if ( i>i1 ) x(i:i+nx-1) = x(i:i+nx-1) - au2(i-nx:i-1)*x(i-nx:i-1)
262        call trisolve(x,i,i+nx-1)                            
263     enddo                           
264     do i = i2-2*nx+1 , i1 , -nx                             
265        t(i:i+nx-1) = au2(i:i+nx-1)*x(i+nx:i+2*nx-1)                         
266        call trisolve(t,i,i+nx-1)                            
267        x(i:i+nx-1) = x(i:i+nx-1) - t(i:i+nx-1)                              
268     enddo                           
269     end subroutine NF2DPrecon !=========================================                            
```

`ArrayValueCopy` has special handling for array slices of the form `(i:j)` and `(j+1:k)`, which allows disambiguating `x(i:i+nx-1)` with `x(i-nx:i-1)`.  We can probably do the same in the optimized bufferization pass or implement something more generic.  For example, we can try to use the affine dialect utilities to detect store-load conflicts based on the iteration space constraints derived from the slices configurations and the mapping of the iteration indices to the memory locations (based on the designator indexing).
</pre>
<img width="1px" height="1px" alt="" src="http://email.email.llvm.org/o/eJzEVUGP4zYP_TXMhUigyLETH3LIJF_wFSgWi6Joz7JFx5yRJUOSM8n8-kJ2Zmen6wXaUwHHjknx8T1KJlUIfLFEe8ifID8t1BBb5_fXN_XSKs-Lyun7_ug04SvHFv__6_mX39C4V_JsL-gHG7BxHrerDQaqndUBrwGLby9j1PcxKxAnEIfp_ntLnpADBtcRuiv5lpRGPRBGh3SLXmGkrndeeaaAKqJhSyiLNWQPDCjE4xpfZV6AKMNQeTfEtPbLWZ6--kQH5O4G8sjrdJMgSwS5RnnCL2ecVmiO7GxS1qno-Qai_O6a4LcgSraRLuQxkcgOyGsEeUSWc-t3IEpPyoDc6f6FrQZZgjxq7sgGHlmNZCasRDDOwZQzaWfWpTKU2iEjZKcPYuPDzukp1iBKROQGQe6QIfvfGFXiLTGD7MAgn-xtuR4Llp1m7UtUg0z2pb0l12gFebj9YHtklVPWWhmD0XNw5krv-zP93rHnOGcgSrJaO5zzbj6VQC4lyIO9gXx6VGN6LOfLkU_E4k_EP2R-9rwLHS0P15T0pwqKef3xn-nfTtH_bodmJM1h795rO-csJyfOfl_pc4Ls9J9dP_D9W2v4Zjx4r-5_KDPQ0fV3KAS2KmDoqWZlsFVWm9QDUmNTaSkGwzUFdA3GlpK9w4Q61fM5FbIQqKx-WJ_TWYPs8DJ50ml7bbluURnjXgNqDqqr-DKomPJAIWZ2LEGO3fPd_ekzKsQK8U_CWlnsvatUZe6o3cgvqI6Q7fjf9ZE7fiON1dA05PlNpRaHvQoBnUfuekMd2Ti24NiOvc95wgtZ8lyvEM_OI91UWjgqmZJGf089egg05lFNkw6DZmWojjhENhxTy44ONcVkC9F5WhqnNNbONobrGLBSgTS6iSxH8hO90Kua0rIQvWIbA2ryfCWNjXfdpHLakwTFl2GKC-MmJG-n-j5peWzZBzJbPcbFqVYddc7f0bj6AQBy94mTpjQgVUy1sppubC8gy9VC7zNdZqVa0H5dlFlZiFyuF-0-3-ot1Sqrmk1d7XZa1aLM61KttyJXRZMteC-FzEQpCiGyXK5X25x2tWiKfKOLNVUFbAR1is3KmGu3cv6y4BAG2hf5RhYLoyoyYRzXUlp6xdEJUoI8gpSNUYmgTLPc7xPCshouATbCcIjhAzNyNOPUnyLyE-RPrWnYQ37Cr87cW9I-TaezbVBmIHPsyaejr2xN6OniKaT5tRi82bcx9iGNJXkGeb5wbIdqVbsO5DllfDyWvXfPVEeQ55F1AHkeVf0VAAD__yMcdBw">