[PATCH] D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions

Wed Jun 28 10:50:44 PDT 2017

cwabbott added a comment.

In https://reviews.llvm.org/D34716#793614, @tpr wrote:

> Hi Connor. We have also been thinking about issues around dpp, wqm and wwm inside AMD. Something we may want to see is a way in machine instructions to express that a dpp operand can be combined into an alu op, and then the write gating (bound_ctrl=1, row and bank masks) affects the result of the alu op, not the result of the dpp move. However I guess that is a more extensive change to the definition of a whole class of instructions. I'd be interested to hear your thoughts.
>
> Being able to combine dpp move, alu op and write gating also affects how to express it at the IR intrinsic level. It seems to me that the write gating needs to be a separate intrinsic, with instruction selection spotting that it can all be combined into a single instruction.

Yeah, I've already added such an intrinsic in https://reviews.llvm.org/D34718. You can also see my implementation of the inclusive scan kernel in Mesa here <https://cgit.freedesktop.org/~cwabbott0/mesa/tree/src/amd/common/ac_llvm_build.c?h=radv-amd-shader-ballot#n437>. Each round generates IR like:

  %1 = call i32 @llvm.amdgcn.update.dpp(i32 0, i32 %0, <dpp_ctrl, etc.>)
  %3 = i32 iadd %1, %2

The key part here is putting the identity of the operation as the "old" source, which works regardless of operation --  this lets it be folded to something like:

  V_IADD_B32_dpp  %3, old:%1, src0:%0, src1:%2 <dpp_ctrl, etc.>

Which becomes a single instruction with %0 and %1 tied to the same register. This should just be a matter of writing a few ISel patterns, although I haven't done that yet, since I've been concentrating on getting it working first.

https://reviews.llvm.org/D34716