[PATCH] D26348: Allow convergent attribute for function arguments

Tue Mar 7 13:10:23 PST 2017

dark_sylinc added a comment.

In https://reviews.llvm.org/D26348#694722, @mehdi_amini wrote:

> > //Example 1: Safe to fold.
> >  void foo(int v, bool cond) {
> > 
> >   if (cond) {
> >     // token1 = statepoint()
> >     // bar(v, token1); // v needs to be convergent
> >   } else {
> >     // token2 = statepoint()
> >     // bar(v, token2);  // v needs to be convergent
> >   }
> >   token_merged = statepoint()
> >   bar(v, token_merged); // v needs to be convergent
> > 
> > }
>
> My understanding was that this is *not* OK to fold, we don't know that two different "thread" have the same value for v. Can you elaborate, it is very possible that I'm misunderstanding something here.

My mistake was not being explicit enough in the examples. My apologies. GPUs differentiate dispatch, threadgroup and thread. A dispatch is made of threadgroups, a threadgroup is made of threads.
Uniform values are read only and have the same value the whole dispatch, while convergent values have the same value for the whole threadgroup.
I assumed v came from a uniform. In this example if v is a uniform, it is safe to fold. As both T0 and T1 are guaranteed to have the same value in v.
**If v is neither uniform nor convergent, then you're right, it's not safe to fold.**

In the example of selecting between `v0` and `v1`; if the condition comes from a uniform (or a convergent), it's safe to fold. If the condition does not come from a uniform or convergent, it's not safe.

  //OK to fold. Notice v is uniform; v never changes within the threadgroup.
  //"token" can be different because it's not required to be convergent.
  void foo( uniform sampler v, bool cond ) {
    if (cond) {
      // token1 = statepoint()
      // bar(v, token1); // v needs to be convergent
    } else {
      // token2 = statepoint()
      // bar(v, token2);  // v needs to be convergent
    }
    token_merged = statepoint()
    bar(v, token_merged); // v is guaranteed to be the same for all threads within the threadgroup.
  }

  //Unsafe code. Can be fold. This code is unsafe no matter what we do. v may be
  //different between threads in the same threadgroup, but all threadgroups will follow
  //the same path (cond is true for all threads within the threadgroup, or false for all threads
  //within the threadgroup).
  void foo( sampler v, uniform bool cond ) {
    if (cond) {
      // token1 = statepoint()
      // bar(v, token1);
    } else {
      // token2 = statepoint()
      // bar(v, token2);
    }
    token_merged = statepoint()
    bar(v, token_merged); // v may be different in the same threadgroup, but there's nothing compiler can do.
  }

  //Unsafe to fold. Must not be fold. v and cond are neither uniform nor convergent.
  void foo( sampler v, bool cond ) {
    if (cond) {
      // token1 = statepoint()
      // bar(v, token1);
    } else {
      // token2 = statepoint()
      // bar(v, token2);
    }
    token_merged = statepoint()
    bar(v, token_merged);
  }

  //OK to fold. Notice v0, v1, and cond are all uniform.
  void foo( uniform sampler v0, uniform sampler v1, uniform bool cond ) {
    token = statepoint();
    if (cond) {
      //bar( v0, token );
    } else {
      //bar( v1, token );
    }
    v2 = phi[ v0, v1 ];
    bar( v2, token );
  }

  //Unsafe to fold. Must not be fold. bool is not uniform, thus threads
  //within the same threadgroup may follow a different path.
  void foo( uniform sampler v0, uniform sampler v1, bool cond ) {
    token = statepoint();
    if (cond) {
      //bar( v0, token );
    } else {
      //bar( v1, token );
    }
    v2 = phi[ v0, v1 ];
    bar( v2, token );
  }

Of course things aren't that simple. If I write:

  uniform sampler mySampler[2];
  uniform bool cond;

  void foo( sampler v0, sampler v1, bool cond );

  void main()
  {
      foo( mySampler[0], mySampler[1], cond );
  }

Then LLVM optimizer needs to see that all arguments to foo are uniform; then it's safe to fold foo (once foo is inlined). If instead I were to write:

  uniform sampler mySampler[2];
  uniform sampler anotherSampler;

  void foo( sampler v0, sampler v1, bool cond );

  void main()
  {
      bool cond = anotherSampler.sample( uv = 0 );; //source of non-convergence.
      foo( mySampler[0], mySampler[1], cond );
  }

Now foo cannot be folded because cond is not convergent.

If I were to write:

  uniform sampler mySampler[2];
  uniform sampler anotherSampler;
  uniform bool someCond;

  void foo( sampler v0, sampler v1, bool cond );

  void main()
  {
      foo( mySampler[0], mySampler[1], someCond ); // inline expansion can be folded
      bool anotherCond = anotherSampler.sample( uv = 0 );; //source of non-convergence.
      foo( mySampler[0], mySampler[1], anotherCond ); // inline expansion cannot be folded
  }

then the first foo() call can be folded, the second one cannot.

It gets even funnier:

  uniform sampler mySampler[2];
  uniform sampler anotherSampler;

  void foo( sampler v0, sampler v1, bool cond );

  void main()
  {
      bool cond = anotherSampler.sample( uv = 0 );; //source of non-convergence.
      int idx0 = anotherSampler.sample( uv = 1 ); //source of non-convergence.
      // mySampler[idx0] is no longer uniform nor convergent, and cond isn't convergent either. Cannot fold.
      foo( mySampler[idx0], mySampler[1], cond );
  }

In other words the optimizer has to track whether the convergence property has been lost.

https://reviews.llvm.org/D26348