[llvm] [AArch64] Fix throughout of 64-bit SVE gather loads (PR #168572)

Wed Nov 19 03:44:48 PST 2025

Asher8118 wrote:

> > > why isn't it possible to get the correct throughput with the existing resources?
> > 
> > 
> > Because the pipeline used by gather loads is unit L, which has 3 resources. This makes it so the throughput is a result of a division by 3.
> 
>  if it's not possible to get that with the resources as documented in the SWOG.

I reasoned it would be a similar case as for flag setting instructions for V cores where we use [V#UnitFlg](https://github.com/llvm/llvm-project/blob/b42851b8dda8c85a277573610519e8c66e91322f/llvm/lib/Target/AArch64/AArch64SchedNeoverseV3.td#L58C1-L58C43), which is also a resource that does not appear in the SWOG.

>Also "Non temporal gather load, vector + scalar 32-bit element size" is 4 micro-ops whereas 64-bit element size is 2 micro-ops, that doesnt make sense.

That is odd, I think the micro-ops number should be the same for both 32-bit and 64-bit. I can change that as part of this patch.

> Looking at the other neoverse cores they all use some of the vector pipes for these gathers, are we sure the SWOG is correct?

I think there are instances for the other Neoverse cores where 64-bit gather loads shows incorrect throughput when compared to the SWOG, eg: [this load](https://github.com/llvm/llvm-project/blob/b42851b8dda8c85a277573610519e8c66e91322f/llvm/test/tools/llvm-mca/AArch64/Neoverse/V3-sve-instructions.s#L4829C1-L4829C86) in V3. 

https://github.com/llvm/llvm-project/pull/168572