[llvm] [NVPTX] Lower 16xi8 and 8xi8 stores efficiently (PR #73646)

Artem Belevich via llvm-commits llvm-commits at lists.llvm.org
Wed Nov 29 10:25:42 PST 2023


================
@@ -5557,6 +5557,51 @@ static SDValue PerformLOADCombine(SDNode *N,
       DL);
 }
 
+// Lower a v16i8 (or a v8i8) store into a StoreV4 (or StoreV2) operation with
+// i32 results instead of letting ReplaceLoadVector split it into smaller stores
+// during legalization. This is done at dag-combine time, so that vector
+// operations with i8 elements can be optimised away instead of being needlessly
+// split during legalization, which involves storing to the stack and loading it
----------------
Artem-B wrote:

Nice. Legalizer assuming that stack loads/stores are cheap is indeed a rather bad misoptimization for NVPTX.

https://github.com/llvm/llvm-project/pull/73646


More information about the llvm-commits mailing list