[llvm] [AArch64] Enable maximising vector bandwidth for all AArch64 cores other than the N2 (PR #166748)

Tue Nov 11 00:05:18 PST 2025

https://github.com/davemgreen commented:

> I wasn't aware of this TTI hook. But I agree it has a very promising name, and why wouldn't we want to set this?

I think it's the same as other unrolling where bigger loops sometimes do not perform as well or are not entered at all. The codegen and costmodel also needs to be decent enough to make it work well, but I believe that is working better than it did in the past. I dont think I would expect it to depend on the core (other than some cores might have certain instructions that were slow, but that would be a difference in individual instructions/costs).

There are some places where the codegen/costmodel is still a little off, like I have this test for scmp which does worse with this enabled. There might be some other cases, but I believe for the most part it performs OK.
https://godbolt.org/z/4Ycqz5Grx

https://github.com/llvm/llvm-project/pull/166748