[PATCH] Divergence analysis for GPU programs

Fri Apr 3 19:10:59 PDT 2015

Hi Jingyue,

>> 3) For non-structured codes, is it not possible that your method
>> exploreSyncDependency may cause the same node to be visited more than
>> once? Imagine a CFG that is like a butterfly, like the one in the PDF
>> I sent you. See line 290. You could avoid this by interrupting the
>> search once you visit a divergent instruction.
>>
>
> I don't get this. Both exploreSyncDependency and exploreDataDependency
> check the visited flag of a value before adding it to the work list. I
> think this is enough for ensuring each value is explored at most once.

Ok, good! I did not see that you were checking if a node had been
already visited. So, forget this one.

>> 4) I think you are flagging a variable that is divergent outside a
>> loop as divergent wherever it is alive, even if it is uniform inside
>> the loop. That is not wrong, but it may lead to less precise results.
>> Imagine, for instance:
>>
>> int i = 0;
>> while (i < tid) {
>>   i++
>>   if (i % 0) {      // i is uniform for every active thread
>>     int x = 2 * i;  // x is uniform for every active thread
>>     v[tid] = x;
>>   }
>> }
>> v[tid] = i; // i is divergent for every thread
>>
>> A more precise analysis would split the live range of i right after
>> the loop, and you would end up with something like:
>>
>> int i = 0;
>> while (i < tid) {
>>   i++
>>   if (i % 0) {
>>     int x = 2 * i;
>>     v[tid] = x;
>>   }
>> }
>> i0 = phi(i); // i0 is divergent, for it is used outside the influence
>> region.
>> v[tid] = i0;
>>
>>
> This is a very good point, but I need to mention two things.
> 1. My implementation does not mark the i in your example as divergent; it
> only marks its out-of-the-loop users as divergent. Therefore, if you worry
> about whether x is considered divergent, the answer is no.
> 2. Divergence analysis is supposed to be readonly and shouldn't modify the
> program. However, if we want to split the live range of i for a more
> precise model, we can run LoopSimplify+LCSSA before the divergence
> analysis. LCSSA (http://llvm.org/docs/doxygen/html/LCSSA_8cpp_source.html)
> in particular will rewrite all out-of-loop users to a PHI node with a
> single incoming value (see the header comments in LCSSA.cpp). I believe
> when running on an LCSSA form, my current implementation can distinguish
> the two live ranges. I'll double check that.
> P.S. LCSSA only transforms natural loops.

Nice. Again, I did not see that you were separating the uses in and
outside the loop. So, that is very nice. As for using LCSSA, that
would be a cool experiment. Indeed, if you consider using Divergence
Analysis for register allocation, it is nice to be able to split a
variable into a uniform part (which you can spill into a shared memory
location if necessary) and a divergent part. But you code looks very
nice in the way that it is now. I would not change it to perform live
range splitting, given that LCSSA will - most likely - solve your
problem. I would not worry about non-structured loops: they are likely
to be rare in practice.

Regards,

Fernando