[PATCH] D55444: AMDGPU: Fix DPP combiner
Valery Pykhtin via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Tue Jan 15 08:02:27 PST 2019
vpykhtin marked an inline comment as done.
vpykhtin added inline comments.
================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:412
+ }
+ // case when DPP mov old == DPP scr register
+ OldOpndVGPR.Reg = AMDGPU::NoRegister;
----------------
cwabbott wrote:
> vpykhtin wrote:
> > cwabbott wrote:
> > > This seems incorrect to me. In the lanes where the source is invalid or the row/bank mask is 0, the DPP move will act as a no-op, here so if that lane is then added, or'd, etc. with something else, we can't emulate that with a single instruction.
> > mov_dpp v1, v1, ... (v1 of other lane is stored in v1 of this lane)
> > add_u32 v0, v1, ...
> >
> > This is the case when DPP src register is stored in the VGPR with the same name of the issuing lane. This way v1 would contain the same value after unsuccessfull DPP mov (no-op) and therefore can be used in the combined VALU op.
> I still don't see how this can work. For something like:
>
> ```
> mov_dpp v1, v1, ...
> add_u32 v0, v1, v2
> ```
>
> lanes where the shared data is invalid based on the DPP ctrl or EXEC will return v1 (same lane) + v2, whereas this will transform it to something like
>
> ```
> add_u32_dpp v0, v1, v2, ...
> ```
>
> which will give you v0 (undef). What's an example of a transform you're trying to accomplish?
Sorry, this should look like:
mov v1, X
mov v2, Y
mov_dpp v1, v1, some_DPP_ctrl
add_u32 v0, v1, v2
transformed to
mov v1, X
mov v2, Y
add_u32_dpp v0, v1, v2, some_DPP_ctrl
v1 should contain X on invalid DPP access or X from other lane on valid.
Repository:
rL LLVM
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D55444/new/
https://reviews.llvm.org/D55444
More information about the llvm-commits
mailing list