[llvm] [SDAG] Add partial_reduce_sumla node (PR #141267)

Wed Oct 1 12:58:45 PDT 2025

quic-akaryaki wrote:

Sorry for out of the blue questions... Is there a plan to handle partial reductions when vectors sizes do not exactly match a hardware instruction (e.g. UDOT)? Right now, it looks such cases do not match and fall back to the ladder algorithm. Do you (folks @ARM) think that it may be worth lowering them to a e.g. a sequence of UDOTs instead? Example is `@llvm.experimental.vector.partial.reduce.add.v2i32.v16i32`. 
How does the upper layer that generates partial reductions (VPlan?) choose the vector sizes? It has to know the exact CPU variant and the IR becomes tied to that variant, is that right?

https://github.com/llvm/llvm-project/pull/141267