[LLVMdev] Loads moving across barriers

Mon Nov 11 19:29:34 PST 2013

On Nov 11, 2013, at 3:13 PM, Andrew Trick <atrick at apple.com> wrote:

> 
> On Nov 9, 2013, at 1:39 PM, Matt Arsenault <arsenm2 at gmail.com> wrote:
> 
>> On Nov 9, 2013, at 3:14 AM, Chandler Carruth <chandlerc at google.com> wrote:
>> 
>>> 
>>> Perhaps you're instead trying to say that with certain address spaces "noalias" (and by inference, "restrict" at the language level) has a different semantic model than other address spaces? While it's less worrisome than the first interpretation, I still don't really like it.
>>> 
>> 
>> This sounds right. With the constant address space, anything you do is OK since it’s constant. Private address space is supposed to be totally inaccessible from other workitems, so parallel modifications aren’t a concern. The others require explicit synchronization which noalias would need to be aware of.
> 
> FWIW, it seems generally useful to me to have a nomemfence function attribute and intrinsic property. We should avoid memory optimization (and possibly other optimization) across these regardless of alias analysis.

There are at least two other kinds of optimizations that I know of that are either invalid or can result in (sometimes significantly) slower code when running OpenCL-style SPMD kernels.

The first is tail duplication that takes something like:
  if (x) {
    …
  } else {
    …
  }
  barrier()
and duplicates the barrier into both sides. This can cause hangs, and perhaps other symptoms depending on exactly how this is compiled, and the particular architecture. This isn’t unique to barrier(). Any intrinsic which is effectively “horizontal”, working across work-items, can potentially result in problems and/or different behavior if duplicated or even moved. It looks like a function attribute “noduplicate" exists to block duplication, but I don’t see anything specifically built to block movement, although perhaps that isn’t happening in practice.

The second is loop unswitching:
  for (…) {
    ...
    if (x) { // x is some expression that is loop invariant
      …
    } else {
      …
    }
    ...
  }
where the resulting code has the if condition outside, and a copy of the loop (with possibly some code elided) in each of the ‘then’ and ‘else’ sides. This can result in running the loop twice, once for work-items where ‘x’ is true, and once for work-items where ‘x’ is false.

This is a fairly special case - offhand I cannot think of other transformations that would have similar effect so perhaps the answer here is just “don’t do that” if you’re compiling SPMD kernels.

Mark

> 
> -Andy