[llvm] [NVPTX] Propagate truncate to operands (PR #98666)

Fri Jul 12 15:41:14 PDT 2024

================
@@ -5541,6 +5541,53 @@ static SDValue PerformREMCombine(SDNode *N,
   return SDValue();
 }
 
+// truncate (logic_op x, y) --> logic_op (truncate x), (truncate y)
+// This will reduce register pressure.
+static SDValue PerformTruncCombine(SDNode *N,
+                                   TargetLowering::DAGCombinerInfo &DCI) {
+  if (!DCI.isBeforeLegalizeOps())
+    return SDValue();
+
+  SDValue LogicalOp = N->getOperand(0);
+  switch (LogicalOp.getOpcode()) {
+  default:
+    break;
+  case ISD::ADD:
+  case ISD::SUB:
+  case ISD::MUL:
+  case ISD::AND:
+  case ISD::OR:
+  case ISD::XOR: {
+    EVT VT = N->getValueType(0);
+    EVT LogicalVT = LogicalOp.getValueType();
+    if (VT != MVT::i32 || LogicalVT != MVT::i64)
+      break;
+    const TargetLowering &TLI = DCI.DAG.getTargetLoweringInfo();
+    if (!VT.isScalarInteger() &&
+        !TLI.isOperationLegal(LogicalOp.getOpcode(), VT))
+      break;
+    if (!all_of(LogicalOp.getNode()->uses(), [](SDNode *U) {
+          return U->isMachineOpcode()
----------------
Artem-B wrote:

It's unfortunate. We've ran into a few other cases before, where a nominally sensible optimization on IR level is suboptimal for a specific target, and there's indeed no good way to tweak it at the moment. 

DAGcombine is probably our plan B. I guess we could benefit from a TLI hook specifying that register use is more expensive compared to a few extra register ops.
If registers are expensive, we could truncate the arguments and demote the logical op. If registers are cheap, then keep the current promote-and-truncate-the-result approach.

If, for some reason there are sill issues with using it in generic DAG combiner, then we can do it in the nvptx back-end, but that should probably be the last resort. I think this kind of tweak would also benefit other targets where larger logical ops are not free. I believe the current assumption is that logical ops cost the same, regardless of their size, so we only care about the number of truncations around them.

https://github.com/llvm/llvm-project/pull/98666