[llvm] [LV][AArch64] Prefer Fixed over Scalable if cost-model is equal (Neoverse V2) (PR #95819)

Tue Jun 18 09:20:30 PDT 2024

sjoerdmeijer wrote:

> > @paulwalker-arm , @david-arm : I think a very fair question is why we should be generating SVE code for this type of kernels? Can you explain the benefit?
> 
> The general answer is the cost model says there's no reason not to use scalable vectors, and with the potential for them to be bigger than their fixed length counterparts there's scope for more performance beyond the assumed minimum.
> 
> What worries me is that you're not suggesting there's a subtle performance difference but in fact the difference is pretty large. I just think this should be captured as ether a more accurate cost model or implementation specific code generation decisions (e.g. `UseScalarIncVL` and `UseSVEFPLD1R`).

Yes, this is the only reason: performance portability of the same binary running on potentially a bigger SVE implementation. This is not helping the Neoverse V2 though. We get very unpredictable and bad performance for small SVE kernels and that's the problem we are dealing with here. The reasons are twofold: micro-architectural reasons, and less mature SVE codegen. The latter can be fixed over time, but then there is still the former. And again, I do want to stress again that predication for this class of loops is simply not necessary.

`UseScalarIncVL` is what we are investigating, we might want to propose to disable that for Neoverse V2 indeed, but that is yet another thing.

So, all of these problems go away if we resort to NEON if the cost-model  assigns equal cost. I could think about a more narrow heuristic: if the kernel is small, there are no masked loads/stores, the cost-model is a tie, favour fixed, something along those lines. But I am afraid you're not going to like that either because you just like to see SVE code here. Cost-modeling is what we are doing here way or another, but I think this is not just a matter of "let's increase the cost of instruction XYZ a little bit". Preferring fixed if the cost-model cannot decide looks very reasonable to me.

https://github.com/llvm/llvm-project/pull/95819