[PATCH] D25343: [OpenCL] Mark group functions as convergent in opencl-c.h

Mon Oct 24 10:08:00 PDT 2016

Anastasia added a comment.

In https://reviews.llvm.org/D25343#567374, @tstellarAMD wrote:

> In https://reviews.llvm.org/D25343#565288, @Anastasia wrote:
>
> > Do you have any code example where Clang/LLVM performs wrong optimizations with respect to the control flow of SPMD execution?
> >
> > My understanding from the earlier discussion we have had: https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22643.html that noduplicate is essentially enough for the frontend to prevent erroneous optimizations. Because in general compiler can't do much with unknown function calls.
>
>
> noduplicate is enough for correctness, but it prevents legal optimizations, like unrolling loops with barriers.  The convergent attribute was added specifically for these kinds of builtins, so we should be using it here instead of noduplicate.
>
> > For LLVM intrinsics it is slightly different as I can deduce from this discussion:http://lists.llvm.org/pipermail/llvm-dev/2015-May/085558.html . It seems like by default it's assumed to be side effect free and can be optimized in various ways.

In https://reviews.llvm.org/D25343#567374, @tstellarAMD wrote:

> In https://reviews.llvm.org/D25343#565288, @Anastasia wrote:
>
> > Do you have any code example where Clang/LLVM performs wrong optimizations with respect to the control flow of SPMD execution?
> >
> > My understanding from the earlier discussion we have had: https://www.mail-archive.com/cfe-commits@lists.llvm.org/msg22643.html that noduplicate is essentially enough for the frontend to prevent erroneous optimizations. Because in general compiler can't do much with unknown function calls.
>
>
> noduplicate is enough for correctness, but it prevents legal optimizations, like unrolling loops with barriers.  The convergent attribute was added specifically for these kinds of builtins, so we should be using it here instead of noduplicate.
>
> > For LLVM intrinsics it is slightly different as I can deduce from this discussion:http://lists.llvm.org/pipermail/llvm-dev/2015-May/085558.html . It seems like by default it's assumed to be side effect free and can be optimized in various ways.

Tom, as far as I understand a valid generalized use case with a barrier is as follows:

  a = compute() 
  barrier()
  use(a)

Regarding unrolling I don't see how the barrier can be unrolled without breaking the correctness of OpenCL programs because all compute() calls of one WG have to complete before use(a) is invoked for the first time. I am not an expert in LLVM transformations but if I read the description of both noduplicate and convergent (http://llvm.org/docs/LangRef.html) none of those seem to make sense to me for the barrier because both allow reordering within the same function or BB respectively and the only valid way seems to mark it as a side-effect intrinsic to prevent any reordering of the barrier itself. I understand within compute() or use() independently there are not limitations on ordering of instructions.

If you have any specific example in mind where the unrolling would work could you please elaborate on it. I think it's essential for the community to undertsnad the use cases for noduplicate/convergent/sideeffect to make better use of them also across similar models i.e. OpenCL, CUDA, etc.

https://reviews.llvm.org/D25343