[PATCH] Adding support for NoDuplicate function attribute in CLang

Thu Nov 21 00:54:44 PST 2013

Hi all,

On 11/20/2013 03:54 PM, David Tweed wrote:
> Note that in the OpenCL use case, the name precisely describes the intent:
> you can move barrier calls around, inline them, etc, if you can show that
> the semantics of OpenCL code is the same but (on some particular
> architectures) you aren't allowed to duplicate a given barrier call (due to
> implementation restrictions) even if otherwise the semantics was ok. This
> relates to the LLVM function attribute named noduplicate (as visible in the
> patch, obviously)

While I think 'noduplicate' is a fine workaround for the problem at hand,
I get a feeling it also "throws the baby out with the bath water" a bit,
disallowing some legal optimizations on SPMD programs in the process.

AFAIU, in the specific OpenCL case one is safe if one can prove the location
you copy the barrier (or even inject a completely new one) to is non-diverging.

Can you give an example of a case where one cannot duplicate
a barrier call if the control dependencies at the duplicated
barrier call site do not change per work-item?

Why and how could some architecture restrict that? AFAIU, it should not be
able to differentiate the copy from any other (user written) barrier,
as "additional synchronization" should be safe in that case.

E.g.:

for (uniform_loop) {
   if (uniform_cond) {
      do_something;
   } else {
      do_something_else;
   }
   barrier();
}

Loop unswitching here might produce:

if (uniform_variable) {
   for (uniform_loop) {
      do_something;
      barrier();
   }
} else {
   for (uniform_loop) {
      do_something_else;
      barrier();
   }
}

These two new loops might be more easily horizontally or
vertically parallelized (e.g. vectorized) and the kernel
semantics is still correct, right?

-- 
Pekka