[polly] r246161 - Do not detect Scops with only one loop.

Tobias Grosser via llvm-commits llvm-commits at lists.llvm.org
Fri Aug 28 04:37:59 PDT 2015


On 08/28/2015 01:29 PM, Johannes Doerfert wrote:
> On 08/28, Tobias Grosser wrote:
>> On 08/28/2015 09:10 AM, Johannes Doerfert wrote:
>>> On 08/28, Tobias Grosser wrote:
>>>> On 08/28/2015 08:53 AM, Johannes Doerfert wrote:
>>>>> On 08/28, Tobias Grosser via llvm-commits wrote:
>>>>>> On 08/27/2015 06:55 PM, Tobias Grosser via llvm-commits wrote:
>>>>>>> Author: grosser
>>>>>>> Date: Thu Aug 27 11:55:18 2015
>>>>>>> New Revision: 246161
>>>>>>>
>>>>>>> URL: http://llvm.org/viewvc/llvm-project?rev=246161&view=rev
>>>>>>> Log:
>>>>>>> Do not detect Scops with only one loop.
>>>>>>>
>>>>>>> If a region does not have more than one loop, we do not identify it as
>>>>>>> a Scop in ScopDetection. The main optimizations Polly is currently performing
>>>>>>> (tiling, preparation for outer-loop vectorization and loop fusion) are unlikely
>>>>>>> to have a positive impact on individual loops. In some cases, Polly's run-time
>>>>>>> alias checks or conditional hoisting may still have a positive impact, but those
>>>>>>> are mostly enabling transformations which LLVM already performs for individual
>>>>>>> loops. As we do not focus on individual loops, we leave them untouched to not
>>>>>>> introduce compile time regressions and execution time noise. This results in
>>>>>>> good compile time reduction (oourafft: -73.99%, smg2000: -56.25%).
>>>>>>>
>>>>>>> Contributed-by: Pratik Bhatu <cs12b1010 at iith.ac.in>
>>>>>>>
>>>>>>> Reviewers: grosser
>>>>>>>
>>>>>>> Differential Revision: http://reviews.llvm.org/D12268
>>>>>>
>>>>>> This change significantly improves compile-time performance on LNT without
>>>>>> causing any execution-time regressions:
>>>>>>
>>>>>> clang -O3 -polly - Before vs. After:
>>>>>> 	
>>>>>> http://llvm.org/perf/db_default/v4/nts/38436
>>>>>>
>>>>>> clang -O3 vs. clang -O3 -polly (before):
>>>>>>
>>>>>> http://llvm.org/perf/db_default/v4/nts/38423?compare_to=38469
>>>>>>
>>>>>> clang -O3 vs. clang -O3 -polly (after):
>>>>>>
>>>>>> http://llvm.org/perf/db_default/v4/nts/38436?compare_to=38469
>>>>>>
>>>>>> We see 78 compile time improvements with 14 of them showing more than 25%
>>>>>> percent compile time reduction. The overall benchmarks which show an
>>>>>> increase in compile time due to Polly are reduced from 140 to 111.
>>>>> While this sounds good, I think we do miss too many SCoPs with this
>>>>> patch. Especially ones that have two or more affine loops but the region
>>>>> entry is just not part of them.
>>>>
>>>> Yes, I also realized we block too many loops here. Polybench's 3mm kernel
>>>> is e.g. split into three kernels now. :(
>>>>
>>>>> To this end I think we need to change
>>>>> the way we count loops. My proposal:
>>>>>
>>>>>    Instead of a boolean Context.hasAffineLoop we keep a integer that we
>>>>>    increase for each affine loop. In the end we do not check if
>>>>>    hasAffineLoop is true or false but the integer is bigger than 1.
>>>>>    This will catch all affine loops in the region.
>>>>
>>>>> @Tobias, @Pratik:
>>>>>    Do you agree that we might miss SCoPs with multiple loops?
>>>>>    Do you think my proposal is OK or do you see any potential for
>>>>>    improvement?
>>>>
>>>>
>>>> Right. That would definitely work. However, we only have this number after
>>>> verifying that the full scop is indeed a scop. My hope would be that we
>>>> could bail out even before running scop-detection over the offending region.
>>>>
>>>> Maybe by iterating over the outermost loops and checking if more of them
>>>> are contained in the region?
>>> It's not as easy in general I think. Maybe building a mapping from
>>> regions to loops when we iterate over __all__ loops. That should work
>>> but outermost loops will probably not be precise.
>>
>> What about the following:
>>
>> Algorithm:
>>
>>    loopNum = 0;
>>
>>    entryNodeLoop = LI->getLoopFor(R->getEntry)
>>
>>    if loop in R
>>      children = loop->getParent()->children
>>    else if loop not in R:
>>      children = loop->children
>>    else // no loop at all
>>      children = LI->getoutermostloops
>>
>>    for child in children
>>      if (R->contains(child)) {
>>        loopNum++;
>>        if (child->subloops > 0)
>>          loopNum++;
>>
>>        if (loopNum >= 2)
>>          return true;
>>      }
>>    return false;
>
> I am not convinced that the loops we see and count should be in any kind
> related to the entry block of the region. What about code like this:
>
>    if (...)
>      LoopNest
>
> can't that be a region? Even if not, I think there is some kind of code
> with many loops but none of them contains the entry block. Do you
> disagree?

That will be the second or the third case of the code above. The second in
case the scop is part of a larger loop and the third case if no loop surrounds
the scop.

Tobias


More information about the llvm-commits mailing list