[PATCH] D26437: Use -fno-unit-at-a-time and -funit-at-a-time

Fri Nov 11 16:59:17 PST 2016

jlebar added a comment.

> This statement confuses me, that is not what I observe:

Sorry, you're right.  This is what clang does in cuda mode specifically.

You may want to do the same thing in opencl mode.  In CUDA if you have

  void wrapper() { my_convergent_barrier_fn(); }

we treat wrapper() as convergent.  This is important because otherwise you cannot meaningfully write wrappers around convergent functions (without using __attribute__((convergent)), I guess, but that doesn't exist in CUDA; it's a clang thing).  I don't know if you want the same behavior in opencl or not.

> But since that is a subgroup barrier and we have SIMT we do not need to issue any instructions to make sure execution will come to the convergence point simultaneously, so barrier is omitted.

OK.  What you want then is to call a llvm intrinsic that is a nop, but is convergent.  Then this pass will do the right thing.

Perhaps the semantics in opencl are that clang must insert a call to this special intrinsic at the beginning of every function that is annotated as __attribute__((convergent)).

Repository:
  rL LLVM

https://reviews.llvm.org/D26437