[polly] r246161 - Do not detect Scops with only one loop.

Johannes Doerfert via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 28 04:29:23 PDT 2015


On 08/28, Tobias Grosser wrote:
> On 08/28/2015 09:10 AM, Johannes Doerfert wrote:
> >On 08/28, Tobias Grosser wrote:
> >>On 08/28/2015 08:53 AM, Johannes Doerfert wrote:
> >>>On 08/28, Tobias Grosser via llvm-commits wrote:
> >>>>On 08/27/2015 06:55 PM, Tobias Grosser via llvm-commits wrote:
> >>>>>Author: grosser
> >>>>>Date: Thu Aug 27 11:55:18 2015
> >>>>>New Revision: 246161
> >>>>>
> >>>>>URL: http://llvm.org/viewvc/llvm-project?rev=246161&view=rev
> >>>>>Log:
> >>>>>Do not detect Scops with only one loop.
> >>>>>
> >>>>>If a region does not have more than one loop, we do not identify it as
> >>>>>a Scop in ScopDetection. The main optimizations Polly is currently performing
> >>>>>(tiling, preparation for outer-loop vectorization and loop fusion) are unlikely
> >>>>>to have a positive impact on individual loops. In some cases, Polly's run-time
> >>>>>alias checks or conditional hoisting may still have a positive impact, but those
> >>>>>are mostly enabling transformations which LLVM already performs for individual
> >>>>>loops. As we do not focus on individual loops, we leave them untouched to not
> >>>>>introduce compile time regressions and execution time noise. This results in
> >>>>>good compile time reduction (oourafft: -73.99%, smg2000: -56.25%).
> >>>>>
> >>>>>Contributed-by: Pratik Bhatu <cs12b1010 at iith.ac.in>
> >>>>>
> >>>>>Reviewers: grosser
> >>>>>
> >>>>>Differential Revision: http://reviews.llvm.org/D12268
> >>>>
> >>>>This change significantly improves compile-time performance on LNT without
> >>>>causing any execution-time regressions:
> >>>>
> >>>>clang -O3 -polly - Before vs. After:
> >>>>	
> >>>>http://llvm.org/perf/db_default/v4/nts/38436
> >>>>
> >>>>clang -O3 vs. clang -O3 -polly (before):
> >>>>
> >>>>http://llvm.org/perf/db_default/v4/nts/38423?compare_to=38469
> >>>>
> >>>>clang -O3 vs. clang -O3 -polly (after):
> >>>>
> >>>>http://llvm.org/perf/db_default/v4/nts/38436?compare_to=38469
> >>>>
> >>>>We see 78 compile time improvements with 14 of them showing more than 25%
> >>>>percent compile time reduction. The overall benchmarks which show an
> >>>>increase in compile time due to Polly are reduced from 140 to 111.
> >>>While this sounds good, I think we do miss too many SCoPs with this
> >>>patch. Especially ones that have two or more affine loops but the region
> >>>entry is just not part of them.
> >>
> >>Yes, I also realized we block too many loops here. Polybench's 3mm kernel
> >>is e.g. split into three kernels now. :(
> >>
> >>>To this end I think we need to change
> >>>the way we count loops. My proposal:
> >>>
> >>>   Instead of a boolean Context.hasAffineLoop we keep a integer that we
> >>>   increase for each affine loop. In the end we do not check if
> >>>   hasAffineLoop is true or false but the integer is bigger than 1.
> >>>   This will catch all affine loops in the region.
> >>
> >>>@Tobias, @Pratik:
> >>>   Do you agree that we might miss SCoPs with multiple loops?
> >>>   Do you think my proposal is OK or do you see any potential for
> >>>   improvement?
> >>
> >>
> >>Right. That would definitely work. However, we only have this number after
> >>verifying that the full scop is indeed a scop. My hope would be that we
> >>could bail out even before running scop-detection over the offending region.
> >>
> >>Maybe by iterating over the outermost loops and checking if more of them
> >>are contained in the region?
> >It's not as easy in general I think. Maybe building a mapping from
> >regions to loops when we iterate over __all__ loops. That should work
> >but outermost loops will probably not be precise.
> 
> What about the following:
> 
> Algorithm:
> 
>   loopNum = 0;
> 
>   entryNodeLoop = LI->getLoopFor(R->getEntry)
> 
>   if loop in R
>     children = loop->getParent()->children
>   else if loop not in R:
>     children = loop->children
>   else // no loop at all
>     children = LI->getoutermostloops
> 
>   for child in children
>     if (R->contains(child)) {
>       loopNum++;
>       if (child->subloops > 0)
>         loopNum++;
> 
>       if (loopNum >= 2)
>         return true;
>     }
>   return false;

I am not convinced that the loops we see and count should be in any kind
related to the entry block of the region. What about code like this:

  if (...)
    LoopNest

can't that be a region? Even if not, I think there is some kind of code
with many loops but none of them contains the entry block. Do you
disagree?
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 213 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20150828/62dc3794/attachment.sig>


More information about the llvm-commits mailing list