[PATCH] D55444: AMDGPU: Fix DPP combiner

Connor Abbott via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Tue Jan 15 07:44:14 PST 2019


cwabbott added inline comments.


================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:369
+
+	if (BoundCtrlZero && MaskAllLanes) { // [1]
+      OldOpndVGPR.Reg = AMDGPU::NoRegister;
----------------
vpykhtin wrote:
> cwabbott wrote:
> > I don't think this is correct. A lane can be set to 0 for the shift DPP controls (e.g. `DPP_ROW_SL1`) or if EXEC is zero for the lane to be read, in which case the user will do something non-trivial (e.g. OR will pass through the other operand) which we can't emulate in general.
> Could you please you show an example of how it wouldn't work? Note that EXEC mask remains the same (by the check above) and no combining is performed when the result of DPP mov is used in any other way than consumed by DPP-capable VALU instruction. I think the result of DPP mov should be the same as the DPP src operand used in the dpp-capable VALU instruction.
Yes, you're right. I was misrembering what bound_ctrl:0 does. Sorry!


================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:412
+    }
+	// case when DPP mov old == DPP scr register
+    OldOpndVGPR.Reg = AMDGPU::NoRegister;
----------------
vpykhtin wrote:
> cwabbott wrote:
> > This seems incorrect to me. In the lanes where the source is invalid or the row/bank mask is 0, the DPP move will act as a no-op, here so if that lane is then added, or'd, etc. with something else, we can't emulate that with a single instruction.
> mov_dpp v1, v1, ... (v1 of other lane is stored in v1 of this lane)
> add_u32 v0, v1, ...
> 
> This is the case when DPP src register is stored in the VGPR with the same name of the issuing lane. This way v1 would contain the same value after unsuccessfull DPP mov (no-op) and therefore can be used in the combined VALU op.
I still don't see how this can work. For something like:

```
mov_dpp v1, v1, ...
add_u32 v0, v1, v2
```

lanes where the shared data is invalid based on the DPP ctrl or EXEC will return v1 (same lane) + v2, whereas this will transform it to something like

```
add_u32_dpp v0, v1, v2, ...
```

which will give you v0 (undef). What's an example of a transform you're trying to accomplish?


Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55444/new/

https://reviews.llvm.org/D55444





More information about the llvm-commits mailing list