[llvm] Handle VECREDUCE intrinsics in NVPTX backend (PR #136253)

Mon Apr 21 15:14:38 PDT 2025

================
@@ -2128,6 +2152,194 @@ NVPTXTargetLowering::LowerCONCAT_VECTORS(SDValue Op, SelectionDAG &DAG) const {
   return DAG.getBuildVector(Node->getValueType(0), dl, Ops);
 }
 
+/// A generic routine for constructing a tree reduction on a vector operand.
+/// This method differs from iterative splitting in DAGTypeLegalizer by
+/// progressively grouping elements bottom-up.
----------------
Artem-B wrote:

Is that always a win?

When vectors are loaded in chunks, it may be beneficial to reduce the first loaded chunk, whiile the subsequent ones may still be in flight. Attempting to use elements from different fragments of the vector will stall on data dependency until all of the arrive. In that sense, iterative splitting may work better as it will process elements as they get loaded, without unnecessary stalling.

Tree reduction might work better to reduce per loaded fragment, and then reduce per-fragment values. 

That said, we're probably not going to see many oddly-sized vectors, and for power-of-2 sized vectors the algorithm may work fine, though even there there may some room for improvement. 

https://github.com/llvm/llvm-project/pull/136253