[llvm] [LICM] Promote conditional, loop-invariant memory accesses to scalars with intrinsic (PR #93999)

Tue Jun 4 09:44:38 PDT 2024

ii-sc wrote:

> * Is this really a universally profitable transform? This adds extra phi nodes to track the condition, plus a conditional store after the loop. This seems like it would be non-profitable at least in the case where the store was originally inside a cold block in the loop.

Now conditional store promotion option is disabled by default. I have missed this aspect at first. 
There are a lot of cases when this optimization is applicable. 
[summary.txt](https://github.com/user-attachments/files/15567667/summary.txt) contains number of promotions in the spec-cpu-2017 benchmark. I think it might be good to have opportunity to possibly optimize such a number of loops.

> * It seems like you are lowering the new conditional.store intrinsic immediately after LICM by scheduling an additional pass (after every single LICM run!) That seems pretty pointless to me. If you are going to do that, you may as well directly update the CFG in LICM. If you're going to introduce that kind of intrinsic, I would expect it to be lowered in the early backend.

There are cases when LLVM might optimize lowered control flow.  For example, this code might be optimized with early lowering:

```int res;

void test(int * restrict a, int N) {
    for (int i = 0; i < N; ++i)
        if (a[i])
            ++res;
}
```

[backend-lowering.txt](https://github.com/user-attachments/files/15567726/backend-lowering.txt) contains assembler from initial patch where masked store  was lowered in the backend. 
[middlened-lowering.txt](https://github.com/user-attachments/files/15567742/middlened-lowering.txt) contains assembler from current version of code promotion. 
The second code has branch straight to the end if the number of iterations is equal to 0, and the first one is not. 

https://github.com/llvm/llvm-project/pull/93999