[llvm] [NVPTX] Lower 16xi8 and 8xi8 stores efficiently (PR #73646)
Pierre-Andre Saulais via llvm-commits
llvm-commits at lists.llvm.org
Wed Dec 13 06:34:18 PST 2023
================
@@ -5557,6 +5557,51 @@ static SDValue PerformLOADCombine(SDNode *N,
DL);
}
+// Lower a v16i8 (or a v8i8) store into a StoreV4 (or StoreV2) operation with
+// i32 results instead of letting ReplaceLoadVector split it into smaller stores
+// during legalization. This is done at dag-combine time, so that vector
+// operations with i8 elements can be optimised away instead of being needlessly
+// split during legalization, which involves storing to the stack and loading it
----------------
pasaulais wrote:
Note that this comment might be out of date, as it looks copied from `PerformLOADCombine` and that was written before stack optimizations were done
https://github.com/llvm/llvm-project/pull/73646
More information about the llvm-commits
mailing list