[PATCH] D55444: AMDGPU: Fix DPP combiner

Valery Pykhtin via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 15 04:04:34 PST 2019


vpykhtin marked 4 inline comments as done.
vpykhtin added inline comments.


================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:369
+
+	if (BoundCtrlZero && MaskAllLanes) { // [1]
+      OldOpndVGPR.Reg = AMDGPU::NoRegister;
----------------
cwabbott wrote:
> I don't think this is correct. A lane can be set to 0 for the shift DPP controls (e.g. `DPP_ROW_SL1`) or if EXEC is zero for the lane to be read, in which case the user will do something non-trivial (e.g. OR will pass through the other operand) which we can't emulate in general.
Could you please you show an example of how it wouldn't work? Note that EXEC mask remains the same (by the check above) and no combining is performed when the result of DPP mov is used in any other way than consumed by DPP-capable VALU instruction. I think the result of DPP mov should be the same as the DPP src operand used in the dpp-capable VALU instruction.


================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:385
+          assert(!BoundCtrlZero); // by check [1]
+		  // undefine old to make regalloc's life easier but this requires
+		  // bound_ctrl:0 set
----------------
cwabbott wrote:
> Indentation is off here, seems like your editor is using hard tabs
Yea, thanks. I reinstalled my editor recently and forgot to turn off tabs. It should be ok with the latest diff.


================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:412
+    }
+	// case when DPP mov old == DPP scr register
+    OldOpndVGPR.Reg = AMDGPU::NoRegister;
----------------
cwabbott wrote:
> This seems incorrect to me. In the lanes where the source is invalid or the row/bank mask is 0, the DPP move will act as a no-op, here so if that lane is then added, or'd, etc. with something else, we can't emulate that with a single instruction.
mov_dpp v1, v1, ... (v1 of other lane is stored in v1 of this lane)
add_u32 v0, v1, ...

This is the case when DPP src register is stored in the VGPR with the same name of the issuing lane. This way v1 would contain the same value after unsuccessfull DPP mov (no-op) and therefore can be used in the combined VALU op.


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55444/new/

https://reviews.llvm.org/D55444





More information about the llvm-commits mailing list