[PATCH] D53762: AMDGPU: Combine DPP mov with use instuctions (VOP1/2/3)

Fri Nov 2 10:42:13 PDT 2018

vpykhtin added inline comments.

================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:228-229
+  case AMDGPU::V_MAX_U32_e32:
+    if (OldOpndValue.getImm() == std::numeric_limits<unsigned>::max())
+      return OldOpndVGPR;
+    break;
----------------
arsenm wrote:
> Why do you need to handle these cases? I would also use uint32_t instead of unsigned
These cases are for the situation when bound_ctrl = 0 (result write disable), meaning that mov result would be the value of 'old' for inactive lanes. If we know the immediate for old we can calculate (is some cases) old value for the VALU operation which isn't the same as for the mov. Otherwise the combining would fail.

For example:

v1 = ...
v10 = ... // other lane reg

v0 = v_mov_b32 1
v0 = v_mov_b32_dpp v10, ..., 0 // bound_ctrl == write disable
v2 = v_mul_u32_u24_e32 v0, v1

in this case we know v0 for inactive lanes would be 1 (identity for mul).  This makes possible to use v1 value as the result of the mul for inactive lanes:

v1 = v_mul_u32_u24_dpp v10, v1, ..., 0 // bound_ctrl == write disable
v2 = v_mov_b32 v1

Othervise the combining isn't possible for bound_ctrl == write disable.

================
Comment at: lib/Target/AMDGPU/GCNDPPCombine.cpp:408
+
+  std::vector<MachineInstr*> DPPMoves;
+  for (auto &MBB : MF) {
----------------
arsenm wrote:
> Why do you need to collect a separate list of the moves in the whole function? Can you just use the dfs iterator to avoid this?
ok

Repository:
  rL LLVM

https://reviews.llvm.org/D53762