[PATCH] Divergence analysis for GPU programs

Fri Apr 3 10:36:37 PDT 2015

On Fri, Apr 3, 2015 at 6:28 AM Fernando Magno Quintao Pereira <
fernando at dcc.ufmg.br> wrote:

> Hi Jingyue.
>
>     I went over your code today. It looks very nice to me. I believe
> the programming style is very clear, and the analysis seem to be
> correct to me. I am sending you below a few observations.
>
> Regards,
>
> Fernando
>
> ---
>
> 1) I think that the more recent TOPLAS paper, "Divergence Analysis -
> Sampaio, Souza, Collange, Pereira, 2013", is a better source of
> information than our original PACT publication (for the comment in
> lines 31-33).
>

Ack'ed. Will fix.

>
> 2) Would it be possible to have a function such as "bool
> isUniformValue(const Value *V) const" or (just the same) "bool
> isDivergentValue(const Value *V) const" in the public interface of the
> analysis? In fact, this is more general than "isDivergentBranch"
> (lines 107-109), and it could help the register allocator, for
> instance.
>

Agreed. The only concern I have is we need to keep all divergent
instructions (as opposed to all divergent branches) for the users of the
divergence analysis. I think it's fine given this analysis already spends
this amount of space when computing divergent values.

>
> 3) For non-structured codes, is it not possible that your method
> exploreSyncDependency may cause the same node to be visited more than
> once? Imagine a CFG that is like a butterfly, like the one in the PDF
> I sent you. See line 290. You could avoid this by interrupting the
> search once you visit a divergent instruction.
>

I don't get this. Both exploreSyncDependency and exploreDataDependency
check the visited flag of a value before adding it to the work list. I
think this is enough for ensuring each value is explored at most once.

>
> 4) I think you are flagging a variable that is divergent outside a
> loop as divergent wherever it is alive, even if it is uniform inside
> the loop. That is not wrong, but it may lead to less precise results.
> Imagine, for instance:
>
> int i = 0;
> while (i < tid) {
>   i++
>   if (i % 0) {      // i is uniform for every active thread
>     int x = 2 * i;  // x is uniform for every active thread
>     v[tid] = x;
>   }
> }
> v[tid] = i; // i is divergent for every thread
>
> A more precise analysis would split the live range of i right after
> the loop, and you would end up with something like:
>
> int i = 0;
> while (i < tid) {
>   i++
>   if (i % 0) {
>     int x = 2 * i;
>     v[tid] = x;
>   }
> }
> i0 = phi(i); // i0 is divergent, for it is used outside the influence
> region.
> v[tid] = i0;
>
>
This is a very good point, but I need to mention two things.
1. My implementation does not mark the i in your example as divergent; it
only marks its out-of-the-loop users as divergent. Therefore, if you worry
about whether x is considered divergent, the answer is no.
2. Divergence analysis is supposed to be readonly and shouldn't modify the
program. However, if we want to split the live range of i for a more
precise model, we can run LoopSimplify+LCSSA before the divergence
analysis. LCSSA (http://llvm.org/docs/doxygen/html/LCSSA_8cpp_source.html)
in particular will rewrite all out-of-loop users to a PHI node with a
single incoming value (see the header comments in LCSSA.cpp). I believe
when running on an LCSSA form, my current implementation can distinguish
the two live ranges. I'll double check that.
P.S. LCSSA only transforms natural loops.

>
> On 4/3/15, Jingyue Wu <jingyue at google.com> wrote:
> > Hi Fernando,
> >
> > I fixed the sync dependency computation in this update. I used to
> consider
> > only the if-then-else case; this version accounts for the loop case.
> >
> > Please take a look when you have time. Thanks a lot for your help!
> >
> > Jingyue
> >
> > ---------- Forwarded message ---------
> > From: Jingyue Wu <jingyue at google.com>
> > Date: Thu, Apr 2, 2015 at 10:48 PM
> > Subject: Re: [PATCH] Divergence analysis for GPU programs
> > To: <jingyue at google.com>, <resistor at mac.com>, <hfinkel at anl.gov>, <
> > eliben at google.com>, <meheff at google.com>, <justin.holewinski at gmail.com>
> > Cc: <bjarke.roune at gmail.com>, <madhur13490 at gmail.com>, <
> > thomas.stellard at amd.com>, <dberlin at dberlin.org>, <echristo at gmail.com>, <
> > llvm-commits at cs.uiuc.edu>
> >
> >
> > This update fixes sync dependency computation. If a value is used outside
> > of
> > the loop in that it is defined, the user is sync dependent on the exit
> > condition of the loop.
> >
> >
> > http://reviews.llvm.org/D8576
> >
> > Files:
> >   include/llvm/Analysis/Passes.h
> >   include/llvm/Analysis/TargetTransformInfo.h
> >   include/llvm/Analysis/TargetTransformInfoImpl.h
> >   include/llvm/CodeGen/BasicTTIImpl.h
> >   include/llvm/InitializePasses.h
> >   include/llvm/LinkAllPasses.h
> >   lib/Analysis/Analysis.cpp
> >   lib/Analysis/CMakeLists.txt
> >   lib/Analysis/DivergenceAnalysis.cpp
> >   lib/Analysis/TargetTransformInfo.cpp
> >   lib/Target/NVPTX/NVPTXTargetTransformInfo.cpp
> >   lib/Target/NVPTX/NVPTXTargetTransformInfo.h
> >   test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll
> >   test/Analysis/DivergenceAnalysis/NVPTX/lit.local.cfg
> >
> > EMAIL PREFERENCES
> >   http://reviews.llvm.org/settings/panel/emailpreferences/
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150403/ea6ad20e/attachment.html>