[llvm] [NVPTX] Optimize v16i8 reductions (PR #67322)
Pierre-Andre Saulais via llvm-commits
llvm-commits at lists.llvm.org
Tue Sep 26 09:42:59 PDT 2023
================
@@ -5294,6 +5295,98 @@ static SDValue PerformEXTRACTCombine(SDNode *N,
return Result;
}
+static SDValue PerformLOADCombine(SDNode *N,
+ TargetLowering::DAGCombinerInfo &DCI) {
+ SelectionDAG &DAG = DCI.DAG;
+ LoadSDNode *LD = cast<LoadSDNode>(N);
+
+ // Lower a v16i8 load into a LoadV4 operation with i32 results instead of
+ // letting ReplaceLoadVector split it into smaller loads during legalization.
+ // This is done at dag-combine1 time, so that vector operations with i8
+ // elements can be optimised away instead of being needlessly split during
+ // legalization, which involves storing to the stack and loading it back.
+ EVT VT = N->getValueType(0);
+ if (VT != MVT::v16i8)
----------------
pasaulais wrote:
I'm not sure if that would work as well as v16i8, since I don't think there is a `ld.v2.b32` instruction we could use. It would mean having to create two `NVPTXISD::LoadV*` nodes here and duplicating some code from `ReplaceLoadVector`.
By the way, I have also tried to do this change in `ReplaceLoadVector` instead of adding a DAG combine for `LOAD` nodes. I backtracked as this was creating stack operations. I didn't check again after your recent commit was merged, but maybe that works better now.
https://github.com/llvm/llvm-project/pull/67322
More information about the llvm-commits
mailing list