[PATCH] D12246: [NVPTX] change threading intrinsics from noduplicate to convergent

Mon Aug 31 21:11:14 PDT 2015

Ah, ok, you were referring to the fact that if “we can prove a condition uniform” actually the “control-equivalent locations” category broadens to more basic-blocks than you can actually infer from the shape of the CFG.
That is certainly true and probably without at least mentioning the concept of “threads” there is no way of define it in a clear way.
Exposing the SPMD though looks like an implementation detail.
Probably mentioning that it should be control equivalent WTR to the threads that would originally have executed the unmodified statement would be enough? Or do you think that it would still be overly restrictive?

Marcello
> On 31 Aug 2015, at 20:55, Mehdi Amini <mehdi.amini at apple.com> wrote:
> 
> 
>> On Aug 31, 2015, at 8:14 PM, Marcello Maggioni <mmaggioni at apple.com <mailto:mmaggioni at apple.com>> wrote:
>> 
>> I see thanks.
>> For the use case Owen is talking about I don’t think it would be allowed, because (if I understood correctly the use case) if the original was:
>> 
>> for (int i = 0; i < 100; ++i) {
>>   k0 = 0;
>>   if (b)
>>     k0 = 15;
>>   gradient_operation(k0);
>> }
>> 
>> transforming it to
>> 
>> if (b) {
>>   k0 = 15;
>>   for (int i = 0; i < 100; ++i) {
>>     gradient_operation(k0);
>>   }
>> } else {
>>   k0 = 0;
>>   for (int i = 0; i < 100; ++i) {
>>     gradient_operation(k0);
>>   }
>> }
>> 
>> In the first case when the gradient_operation is executed all the registers the operation depends on are set to a correct value, while in the second case in the “true” part of the if the values for the gradient operation wouldn’t be correctly set as they wouldn’t have the value for the “else” part set correctly.
> 
> I know quite well the problem ;) from the hardware and SPMD model point of view. This is why I said that it is invalid if the condition (b) is not uniform.
> But the question here is can you specify the convergent attribute in an abstract manner at the IR level without introducing SPMD or threads at all.
> 
> The Language Reference only mentions *moving* a convergent instruction, maybe saying that in case of duplication the same constraint applies: i.e. "the duplicated basic block must be dominated by the original one, and post-dominate it”. 
> This would allow unrolling:
> 
>   for (int i = 0; i < 100; ++i) {
>     // do stuff
>     gradient_operation(k0);
>     ++i;
>    // do stuff
>     gradient_operation(k0);
>   }
> 
> Or more broadly it can probably extended to "the duplicated *CFG subset* is dominated by the original one and post-dominate it”, to allow:
> 
>   for (int i = 0; i < 100; ++i) {
>     // do stuff
>     if (cond)
>        gradient_operation(k0);
>     else 
>        // other stuff
>     ++i;
>     // do stuff
>     if (cond)
>        gradient_operation(k0);
>     else 
>        // other stuff
>   }
> 
> 
> This would still prevent the invalid unswitching. However this is overly conservative as we wouldn’t allow the loop unswitching even if the condition could be proven uniform.
> This is where I don’t see how you can get away with introducing the SPMD/threading model.
> 
> 
> — 
> Mehdi
> 
> 
> 
> 
> 
>> 
>> Marcello
>> 
>>> On 31 Aug 2015, at 19:51, Mehdi Amini <mehdi.amini at apple.com <mailto:mehdi.amini at apple.com>> wrote:
>>> 
>>> 
>>> 
>>> On Aug 31, 2015, at 7:29 PM, Marcello Maggioni <mmaggioni at apple.com <mailto:mmaggioni at apple.com>> wrote:
>>> 
>>>> 
>>>>> On 31 Aug 2015, at 17:57, Mehdi Amini via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>>> 
>>>>>> 
>>>>>> On Aug 31, 2015, at 4:55 PM, Owen Anderson via llvm-commits <llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> On Aug 31, 2015, at 4:08 PM, Bjarke Roune <broune at google.com <mailto:broune at google.com>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On Mon, Aug 31, 2015 at 10:42 AM, Owen Anderson <resistor at mac.com <mailto:resistor at mac.com>> wrote:
>>>>>>> 
>>>>>>>> On Aug 31, 2015, at 10:01 AM, Bjarke Roune <broune at google.com <mailto:broune at google.com>> wrote:
>>>>>>>> 
>>>>>>>> I'm not sure if you're saying that there should be no limitations as the use cases you had mind for convergent are always OK with such duplication, or that we should be figuring this stuff out on a case-by-case basis, or something else?
>>>>>>> 
>>>>>>> I designed the semantics of convergent to meet the needs of texture and gradient operations, without a lot of consideration for barriers.  IMO, It would be a nice end result if it turns out be just what barriers need as well, but I don’t want to accidentally hose the use cases I intended it for in pursuit of making it work for barriers.
>>>>>>> 
>>>>>>> Everything you’ve identified re: full vs partial unrolling, unstitching, etc. seems fine to me so far.
>>>>>>> 
>>>>>>> So texture and gradient operations also requires these restrictions for unswitching and partial unrolling, or is it just that imposing such restrictions even if unnecessary is fine?
>>>>>>> 
>>>>>>> Would you say that we need the langref to say something about when convergent operations can be duplicated? I think it's hard to reason about whether unrolling, loop unswitching and other passes are doing the right thing without that.
>>>>>> 
>>>>>> I really want to avoid talking about duplication as part of the semantics of convergent, because I fundamentally believe the focus on duplication is a red herring.  It’s not the duplication that’s the problem, it’s the introduction of conditionals.
>>>>>> 
>>>>>> Within a SPMD model, the constraint that textures and gradients, and to some extent barriers, must preserve is that the set of executing threads must be the same at the location where they end up as the place where the user wrote them.  
>>>>> 
>>>>> What is the last “them” of your sentence referring to? I read it as referring to “the set of executing threads”, but then I can’t make sense of the sentence.
>>>>> 
>>>>> The definition in the Lang.Ref. (i.e. " A dominates B and B post-dominates A”) makes sense for moving such an instruction, but I’m having hard time to apply it for the duplication.
>>>>> 
>>>>>> To phrase that without talking about SPMD models or threads, it suffices to say that they must be executed in under the same set of execution predicates as where the user wrote them, aka in a control-equivalent location.
>>>>> 
>>>>> I’m not sure if you mean to allow or disallow the loop unswitching transformation mentioned by Bjarke before, i.e.:
>>>>> 
>>>>> if (b) {
>>>>>     for (int i = 0; i < 100; ++i) {
>>>>>        // do something
>>>>>     __syncthreads();
>>>>>   }
>>>>> } else {
>>>>>    for (int i = 0; i < 100; ++i) {
>>>>>       __syncthreads();
>>>>>   }
>>>>> }
>>>>> 
>>>>> I think it would match your constraint, i.e. "they are be executed in under the same set of execution predicates as where the user wrote them”. I think that because you exclude the SPMD/threads from your definition you lose the fact that b can be divergent in the thread block. 
>>>>> 
>>>> 
>>>> What do you mean by “B can be divergent in the thread block” exactly?
>>> 
>>> Maybe not the right word, apologize.
>>> I mean b can be a loop invariant but not the same for all threads in a block. Non-uniform condition is maybe a more correct term?
>>> 
>>> 
>>> -- 
>>> Mehdi 
>>> 
>>> 
>>>>> Looks like defining what we need for the “convergent” attribute to work without introducing threads is uneasy.
>>>>> 
>>>>> — 
>>>>> Mehdi
>>>>> 
>>>>> _______________________________________________
>>>>> llvm-commits mailing list
>>>>> llvm-commits at lists.llvm.org <mailto:llvm-commits at lists.llvm.org>
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits <http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits>
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150831/96d238d7/attachment.html>