[PATCH] D123835: AMDGPU/SDAG: Refine the fold to v_mad_[iu]64_[iu]32

Mon May 2 15:39:05 PDT 2022

nhaehnle added inline comments.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10702
+    for (auto I = LHS->use_begin(), E = LHS->use_end(); I != E; ++I) {
+      if (I.getUse().getResNo() != 0)
+        continue;
----------------
foad wrote:
> arsenm wrote:
> > I don't understand why you're checking this if you bail on not ISD:ADD. I guess it would make sense if you were handling the carry out adds in a separate patch?
> LHS is the MUL here, not an ADD, so there's really no need to check ResNo.
Makes sense, thanks.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10748
+  SDValue Shift = DAG.getShiftAmountConstant(32, MVT::i64, SL);
+  SDValue AccumLo = DAG.getNode(ISD::TRUNCATE, SL, MVT::i32, Accum);
+  SDValue AccumHi = DAG.getNode(ISD::SRL, SL, MVT::i64, Accum, Shift);
----------------
arsenm wrote:
> foad wrote:
> > I don't know if it makes any practical difference, but other code like `AMDGPUTargetLowering::LowerUDIVREM64` uses EXTRACT_ELEMENT to split an i64 into a pair of i32s, and BITCAST(BUILD_VECTOR ...) to reassemble them.
> Using the shift adds extra steps. The combine on 64 bit shifts will turn this into the vector build 
I'm rearranging the code slightly to use more EXTRACT_ELEMENT and BUILD_VECTOR directly. However, I'm keeping some TRUNCATEs around (instead of extracing the low part) because that turns out to result in better code generation in some tests. Actually also noticed a crash in doing so and fixed that.

================
Comment at: llvm/lib/Target/AMDGPU/SIISelLowering.cpp:10752-10757
+  if (!MulLHSUnsigned32) {
+    SDValue MulLHSHi = DAG.getNode(ISD::SRL, SL, MVT::i64, MulLHS, Shift);
+    MulLHSHi = DAG.getNode(ISD::TRUNCATE, SL, MVT::i32, MulLHSHi);
+    SDValue MulHi = DAG.getNode(ISD::MUL, SL, MVT::i32, MulLHSHi, MulRHSLo);
+    AccumHi = DAG.getNode(ISD::ADD, SL, MVT::i32, MulHi, AccumHi);
+  }
----------------
arsenm wrote:
> A comment with the DAG formed here would be helpful 
Ok.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D123835/new/

https://reviews.llvm.org/D123835