[PATCH] D55444: AMDGPU: Fix DPP combiner

Tue Jan 15 08:02:27 PST 2019

vpykhtin marked an inline comment as done.
vpykhtin added inline comments.

================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:412
+    }
+	// case when DPP mov old == DPP scr register
+    OldOpndVGPR.Reg = AMDGPU::NoRegister;
----------------
cwabbott wrote:
> vpykhtin wrote:
> > cwabbott wrote:
> > > This seems incorrect to me. In the lanes where the source is invalid or the row/bank mask is 0, the DPP move will act as a no-op, here so if that lane is then added, or'd, etc. with something else, we can't emulate that with a single instruction.
> > mov_dpp v1, v1, ... (v1 of other lane is stored in v1 of this lane)
> > add_u32 v0, v1, ...
> > 
> > This is the case when DPP src register is stored in the VGPR with the same name of the issuing lane. This way v1 would contain the same value after unsuccessfull DPP mov (no-op) and therefore can be used in the combined VALU op.
> I still don't see how this can work. For something like:
> 
> ```
> mov_dpp v1, v1, ...
> add_u32 v0, v1, v2
> ```
> 
> lanes where the shared data is invalid based on the DPP ctrl or EXEC will return v1 (same lane) + v2, whereas this will transform it to something like
> 
> ```
> add_u32_dpp v0, v1, v2, ...
> ```
> 
> which will give you v0 (undef). What's an example of a transform you're trying to accomplish?
Sorry, this should look like:

mov v1, X
mov v2, Y
mov_dpp v1, v1, some_DPP_ctrl
add_u32 v0, v1, v2

transformed to

mov v1, X
mov v2, Y
add_u32_dpp v0, v1, v2, some_DPP_ctrl

v1 should contain X on invalid DPP access or X from other lane on valid.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D55444/new/

https://reviews.llvm.org/D55444