[PATCH] D26348: Allow convergent attribute for function arguments

Sun Feb 12 14:23:56 PST 2017

dark_sylinc added a comment.

I've given it a little more thought during coffee break.

divergent_branch, divergent_index, convergent_branch & convergent_index aren't enough.
At least one more modifier is needed: cannot_diverge (or similar name).

By default most variables would be convergent_branch convergent_index; while texture variables would be divergent_branch convergent_index. These defaults could be changed on demand (i.e. to accomodate to GPU arch needs).

Consider the following examples:

Example 1:

  float4 texture( cannot_diverge_branch sampler2D texSampler, float2 uv ); //Internally implemented by LLVM to map to a GPU instruction(s).

  divergent_branch convergent_index sampler2D myTexture0;
  divergent_branch convergent_index sampler2D myTexture1;
  convergent_branch convergent_index int condValue;
  convergent_branch convergent_index float2 uv;

  void main()
  {
      if( condValue > 1 )
          outColour = texture( myTexture0, uv );
      else
          outColour = texture( myTexture1, uv );
  }

Here in this case, myTexture0 & myTexture1 are both marked as divergent_branch. Because we enter a branch and we are calling texture() which has cannot_diverge on its argument, the following should happen:

1. LLVM checks that an argument with divergent_branch was passed to a function that requires this argument to not diverge.
2. The branch can only be flattened if it performs 2 texture samples then branchless select. It cannot do branchless select first, then perform the 1 sample.
3. myTexture0 can not be factored out of the branch to do branchless selection with myTexture1.
4. myTexture1 can not be factored out of the branch to do branchless selection with myTexture0.

If myTexture0 & myTexture1 were both to be tagged as convergent_branch instead, then there would be no problem.

Example 2:

  divergent_branch convergent_index sampler2D myTexture0;
  divergent_branch convergent_index sampler2D myTexture1;
  convergent_branch convergent_index int condValue;
  convergent_branch convergent_index float2 uv;

  float4 doSomething( sampler2D texSampler, float2 uv );
  {
      return uv.xyxy;
  }

  void main()
  {
      if( condValue > 1 )
          outColour = doSomething( myTexture0, uv );
      else
          outColour = doSomething( myTexture1, uv );
  }

Here, 'doSomething' was called. However its argument does not have cannot_diverge; therefore the branch can still be flattened in any way. myTexture0 is not used, but if it could be used in any form (arithmetic operations??? not possible but just for the sake of the example); it could be moved by the optimizer outside the loop for branchless select against myTexture1.

Example 3:

  divergent_branch convergent_index sampler2D myTexture0;
  divergent_branch convergent_index sampler2D myTexture1;
  convergent_branch convergent_index int condValue;
  convergent_branch convergent_index float2 uv;

  float4 doSomething( sampler2D texSampler, float2 uv );
  {
      return texture( texSampler, uv.xy );
  }

  void main()
  {
      if( condValue > 1 )
          outColour = doSomething( myTexture0, uv );
      else
          outColour = doSomething( myTexture1, uv );
  }

Here, doSomething() does not have 'cannot_diverge' attribute, but it ends up calling texture() which does. Therefore the same outcome as example 1 ends up happening.

Example 4:

  divergent_branch convergent_index sampler2D myTexture0[4];
  divergent_branch convergent_index sampler2D myTexture1;
  convergent_branch convergent_index int condValue;
  convergent_branch convergent_index int index;
  convergent_branch convergent_index float2 uv;

  float4 doSomething( sampler2D texSampler, float2 uv );
  {
      return texture( texSampler, uv.xy );
  }

  void main()
  {
      if( condValue > 1 )
          outColour = doSomething( myTexture0[index], uv );
      else
          outColour = doSomething( myTexture1, uv );
  }

Here myTexture0 was indexed. Since the sampler is marked as convergent_index, the comparison of cannot_diverge_branch && divergent_branch is ignored. This means the myTexture0 can be factored out of the loop, branch could be flattened in any way, etc.

Example 5:

  divergent_branch convergent_index sampler2D myTexture0[4];
  divergent_branch convergent_index sampler2D myTexture1;
  convergent_branch convergent_index int condValue;
  convergent_branch convergent_index int index;
  convergent_branch convergent_index float2 uv;

  float4 doSomething( sampler2D texSampler, float2 uv );
  {
      return texture( texSampler, uv.xy );
  }

  void main()
  {
      if( condValue > 1 )
          outColour = doSomething( myTexture0[divergent_index(index)], uv );
      else
          outColour = doSomething( myTexture1, uv );
  }

This is like Example 4; but the index caused the variable texSampler = myTexture0[divergent_index(index)] to become divergent_index as well.
This means the compiler now behaves like in Example 1.

In summary, the final logic inside LLVM would be something like this:

  if( function.argument.cannot_diverge_branch &&
      (variable.divergent_branch || variable.divergent_index) &&
      !variable.was_converge_indexed /*set to true if it was read from an array without divergent_index(index)*/ )
  {
      beCarefulWithFlattenning( variable );
      cannotMoveOutOfBranch( variable );
  }

I don't know how easy/hard it would be to implement all of this; but it is clear the number of use cases cannot be handled easily with just one attribute if one intends to compile the code to produce correct output AND optimize as much as possible.

Of course at the time being implementing divergent_branch & cannot_diverge is the most urgent matter. The rest of the keywords could be left reserved.

**May I point out this bug does not only produce graphical artifacts. It can also cause GPU hangs:**

  int nCount = 0;
  if( someCond )
      nCount = texture( myTex0, uv );
  else
      nCount = texture( myTex1, uv );

  for( int i=0; i<nCount; ++i )
  {
      ...
  }

What happens if nCount is read from the wrong path and ends up a very large value? That's right. GPU hang (if someone's lost here; a GPU hang is whenever the GPU stops responding to perform graphics commands like updating the display. This can be caused because of HW malfunctions, it crashed, or it's stuck doing a very long computation).
Considering there's a lot of GPU hang tickets out there in the radeonsi bugtracker, I wouldn't be surprised if one of these hangs is actually caused by this. **I cannot stress enough this should be a priority bug.**

https://reviews.llvm.org/D26348