[llvm] [AArch64] Optimization of repeated constant loads (#51483) (PR #86249)

Fri Mar 29 00:16:38 PDT 2024

davemgreen wrote:

> > Is it worth doing that first, then this could be simplified a little to just having to look at str(or(x, x, lsl 32)).
> 
> This handles some cases, but not all. Consider:
> 
> ```
> void f(long long *A) {
>    *A = 0xC2000000C2000000;
> }
> 
> long long f2() {
>     return 0xC2000000C2000000;
> }
> ```
> 
> You want mov+stp for the store... but you don't want to generate orr for the return. (Even for the cases where we generate three instructions, we're going to avoid "orr(x, x, lsl 32)" where we can because it's more expensive than movk.)
> 
> Maybe we should run this optimization before pseudo-expansion?

`orr(x, x, lsl n)` is usually as cheap as a movk. It's really just a `lsl #n` with an `or` on the end, and unlike add/sub it can be done in a single cycle. Unfortunately that might not apply for every cpu, but should be for everything modern enough.

You might be right that we might need both methods in the end though. I was hoping we could add the `orr` combine because it is useful on it's own.

https://github.com/llvm/llvm-project/pull/86249