[PATCH] D17518: [ifcnv] Add comment explaining why it's OK to duplicate convergent MIs in ifcnv.

Thu Mar 31 14:01:08 PDT 2016

jlebar added a comment.

In http://reviews.llvm.org/D17518#387824, @rnk wrote:

> I'm not sure I understand the comment. This transform is basically turning
>  ifs into predicated instructions, right? And we're basically saying that a
>  predicated convergent operation (threadsync or what have you) is the same as
>  having the operation in a conditional basic block?

Not quite.  The simple case of converting

  if (pred) __syncthreads();
  ==>
  (predicated on pred) __syncthreads();

is trivially safe, like I think you're saying.  But ifconv can also perform
transformations which duplicate the newly-predicated instruction.  The question
is whether or not it's safe to duplicate a convergent instruction, in the
process of if-conversion.

A simple example is

     BB0
    /   \
   BB1  BB2
   |\_ _/
   | | |
   | TBB --> exit
   |
  FBB

TBB forms ifconv's "simple" shape with its predecessors.  Can we move TBB's
contents into BB1 and BB2 (as predicated instructions) if TBB contains a
convergent op?

I'd argued in http://reviews.llvm.org/D17430 that this was safe, but now I think it is not.  If we
think concretely in terms of CUDA, a necessary condition for correctness is
that

- if all threads in the warp executed TBB "together" (i.e., convergently) before the transformation,
- then all threads in the warp must execute *the same copy* of TBB's instructions after the transformation.

But clearly that's not necessarily true in this example.

In terms of the LLVM spec, we say that you're not allowed to "add" a
control-flow dependency to a convergent op.  In this case, if BB1 switches on
"cond", then the original CFG dependency expression for TBB is

  (BB1 && cond) || BB2

But after predication, we have two copies of the convergent operation:

  BB1 && cond
  BB2

I think my mistake earlier was concluding from this that we "removed" CFG
dependencies, rather than adding them.  But let me write this in an equivalent
way:

  BB1 && cond && !BB2
  BB2 && !BB1

Now it's suddenly clear that we added a CFG dependency to each of these new
instructions.

Put another way, there's a set of control flows which result in us reaching an
instruction.  If I add a CFG dependency, I make that set *smaller*.  That's
what we're not allowed to do with convergent operations, and that's what we're
doing here.

@resistor, I'm going to revert this back to what we originally had in http://reviews.llvm.org/D17430.
Reid, thank you for being confused by this.  :)

http://reviews.llvm.org/D17518