[llvm] [NVPTX] Lower 16xi8 and 8xi8 stores efficiently (PR #73646)

Tue Nov 28 21:10:28 PST 2023

================
@@ -5557,6 +5557,50 @@ static SDValue PerformLOADCombine(SDNode *N,
       DL);
 }
 
+// Lower a v16i8 (or a v8i8) store into a StoreV4 operation with i32 results
+// instead of letting ReplaceLoadVector split it into smaller stores during
+// legalization. This is done at dag-combine1 time, so that vector operations
----------------
bondhugula wrote:

Are you sure about this? This is intended to be done before legalization. Changed to dag-combine.

https://github.com/llvm/llvm-project/pull/73646