[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent

Mon Aug 31 17:57:08 PDT 2015

> On Aug 31, 2015, at 4:55 PM, Owen Anderson via llvm-commits <llvm-commits at lists.llvm.org> wrote:
> 
>> 
>> On Aug 31, 2015, at 4:08 PM, Bjarke Roune <broune at google.com <mailto:broune at google.com>> wrote:
>> 
>> 
>> On Mon, Aug 31, 2015 at 10:42 AM, Owen Anderson <resistor at mac.com <mailto:resistor at mac.com>> wrote:
>> 
>>> On Aug 31, 2015, at 10:01 AM, Bjarke Roune <broune at google.com <mailto:broune at google.com>> wrote:
>>> 
>>> I'm not sure if you're saying that there should be no limitations as the use cases you had mind for convergent are always OK with such duplication, or that we should be figuring this stuff out on a case-by-case basis, or something else?
>> 
>> I designed the semantics of convergent to meet the needs of texture and gradient operations, without a lot of consideration for barriers.  IMO, It would be a nice end result if it turns out be just what barriers need as well, but I don’t want to accidentally hose the use cases I intended it for in pursuit of making it work for barriers.
>> 
>> Everything you’ve identified re: full vs partial unrolling, unstitching, etc. seems fine to me so far.
>> 
>> So texture and gradient operations also requires these restrictions for unswitching and partial unrolling, or is it just that imposing such restrictions even if unnecessary is fine?
>> 
>> Would you say that we need the langref to say something about when convergent operations can be duplicated? I think it's hard to reason about whether unrolling, loop unswitching and other passes are doing the right thing without that.
> 
> I really want to avoid talking about duplication as part of the semantics of convergent, because I fundamentally believe the focus on duplication is a red herring.  It’s not the duplication that’s the problem, it’s the introduction of conditionals.
> 
> Within a SPMD model, the constraint that textures and gradients, and to some extent barriers, must preserve is that the set of executing threads must be the same at the location where they end up as the place where the user wrote them.  

What is the last “them” of your sentence referring to? I read it as referring to “the set of executing threads”, but then I can’t make sense of the sentence.

The definition in the Lang.Ref. (i.e. " A dominates B and B post-dominates A”) makes sense for moving such an instruction, but I’m having hard time to apply it for the duplication.

> To phrase that without talking about SPMD models or threads, it suffices to say that they must be executed in under the same set of execution predicates as where the user wrote them, aka in a control-equivalent location.

I’m not sure if you mean to allow or disallow the loop unswitching transformation mentioned by Bjarke before, i.e.:

if (b) {
    for (int i = 0; i < 100; ++i) {
       // do something
    __syncthreads();
  }
} else {
   for (int i = 0; i < 100; ++i) {
      __syncthreads();
  }
}

I think it would match your constraint, i.e. "they are be executed in under the same set of execution predicates as where the user wrote them”. I think that because you exclude the SPMD/threads from your definition you lose the fact that b can be divergent in the thread block. 

Looks like defining what we need for the “convergent” attribute to work without introducing threads is uneasy.

— 
Mehdi

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150831/b508326a/attachment.html>