[PATCH] D34716: [AMDGPU] Add pseudo "old" and "wqm_mode" source to all DPP instructions

Wed Jun 28 15:19:23 PDT 2017

nhaehnle added a comment.

So, one thing that's not clear to me with is the semantics of how the update.dpp intrinsic is supposed to enable WQM or WWM. In your sequence of instructions, if you just put a WQM/WWM flag on the update.dpp intrinsic, how does LLVM know whether the regular ALU intrinsics in between should run in WQM/WWM or not?

Tim had an interesting proposal for that, which involved a pair of intrinsics:

llvm.amdgcn.helpervalue(src, helpervalue) --> returns src for active lanes and helpervalue for other lanes

llvm.amdgcn.wwm(src) --> returns src for active lanes and undefined/poison (my choice of words, not TIm's) for other lanes, but guarantees that the computations leading to src are executed "as-if" in WWM.

llvm.amdgcn.wqm(src) --> analogous

I'm writing "as-if", because not **all** computations leading up to src actually need to be in WWM: llvm.amdgcn.helpervalue can act as a "barrier" to the propagation of WWM. So if you think of the graph of WWM computations, .helpervalue acts as a source, and .wwm acts as a sink.

I think this proposal goes a long way towards clarifying which operations actually need WQM/WWM. One issue that occurred to me today is that the semantics are unclear when control flow is involved. Two basic examples to think about:

  v = some computation
  if (cond) {
     t1 = f(v)
     r1 = wwm(t1)
  } else {
     t2 = f(v)
     r2 = wwm(t2)
  }

I believe the desirable semantics here are clear, though they may require some compiler work. Basically, you want the entire vector of v be equal at the start of both blocks. This requires ensuring that no part of it gets overwritten during the first block we go through.

The much more problematic case is:

  if (cond) {
    v1 = ...
  } else {
    v2 = ...
  }
  v = wwm(phi(v1, v2))

What does v look like? Specifically, what's in the inactive lanes? Perhaps the best thing we can do is say that the active lanes come from the predecessor block they went through, and all the other lanes come from one of the two blocks, though it is undefined which one.

https://reviews.llvm.org/D34716