[llvm] [NVPTX] Optimize v2x16 BUILD_VECTORs to PRMT (PR #116675)

Mon Nov 25 11:13:45 PST 2024

================
@@ -6176,6 +6176,57 @@ static SDValue PerformLOADCombine(SDNode *N,
       DL);
 }
 
+static SDValue
+PerformBUILD_VECTORCombine(SDNode *N, TargetLowering::DAGCombinerInfo &DCI) {
+  auto VT = N->getValueType(0);
+  if (!DCI.isAfterLegalizeDAG() || !Isv2x16VT(VT))
+    return SDValue();
+
+  auto Op0 = N->getOperand(0);
+  auto Op1 = N->getOperand(1);
+
+  // Start out by assuming we want to take the lower 2 bytes of each i32
+  // operand.
+  uint64_t Op0Bytes = 0x10;
+  uint64_t Op1Bytes = 0x54;
+
+  std::pair<SDValue *, uint64_t *> OpData[2] = {{&Op0, &Op0Bytes},
+                                                {&Op1, &Op1Bytes}};
+
+  // Check that each operand is an i16, truncated from an i32 operand. We'll
+  // select individual bytes from those original operands. Optionally, fold in a
+  // shift right of that original operand.
+  for (auto &[Op, OpBytes] : OpData) {
+    // Eat up any bitcast
+    if (Op->getOpcode() == ISD::BITCAST)
+      *Op = Op->getOperand(0);
+
+    if (Op->getValueType() != MVT::i16 || Op->getOpcode() != ISD::TRUNCATE ||
+        Op->getOperand(0).getValueType() != MVT::i32)
+      return SDValue();
+
+    *Op = Op->getOperand(0);
+
+    // Optionally, fold in a shift-right of the original operand and permute
+    // the two higher bytes from the shifted operand
+    if (Op->getOpcode() == ISD::SRL && isa<ConstantSDNode>(Op->getOperand(1))) {
+      if (cast<ConstantSDNode>(Op->getOperand(1))->getZExtValue() == 16) {
+        *OpBytes += 0x22;
+        *Op = Op->getOperand(0);
+      }
+    }
----------------
Artem-B wrote:

The '32-bit-source + mask' intermadiate value approach would be able to preserve that nice example. Nothing stops us from lowering a pair of `SM(value, 0xffff), SM(value, 0xffff0000)` as `mov.s32 {rs1, rs2}, value` when we need two 16-bit values, but use `prmt` when the destination is 32-bit. E.g. if your example above would only swap those 16-bit parts, without addition.

https://github.com/llvm/llvm-project/pull/116675