[llvm] [AArch64][MachineCombiner] Reassociate long chains of accumulation instructions into a tree to increase ILP (PR #126060)

Mon Feb 10 10:20:38 PST 2025

davemgreen wrote:

I was mostly worried about UABAL not being generated very often, and this turning out to not be profitable because we hadn't tested it on a larger number of cases. With how many ops are needed to make it profitable (8?) it seems like it should be OK. It is probably worth adding SABAL too though, to keep them symmetric.

I wasn't sure how the end of the chain was calculated, I thought it might be worth looking at a looping chain too as in https://godbolt.org/z/8conT4xaa. It might need a longer chain to be profitable though, longer than might be realistic if it loses the ability to forward the uabal (if that is how it works https://godbolt.org/z/YWhY98dxj). Perhaps it needs a higher-level transform that could accumulate multiple chains after the loop.

And it might be more awkward, but is it worth trying to support uabal and uabal2 operations in the same chain? I imagine the interleaving of the high instructions to be quite efficient in well written vector code that can use them.

https://github.com/llvm/llvm-project/pull/126060