[polly] r246161 - Do not detect Scops with only one loop.

Fri Aug 28 00:13:09 PDT 2015

On 08/28/2015 08:53 AM, Johannes Doerfert wrote:
> On 08/28, Tobias Grosser via llvm-commits wrote:
>> On 08/27/2015 06:55 PM, Tobias Grosser via llvm-commits wrote:
>>> Author: grosser
>>> Date: Thu Aug 27 11:55:18 2015
>>> New Revision: 246161
>>>
>>> URL: http://llvm.org/viewvc/llvm-project?rev=246161&view=rev
>>> Log:
>>> Do not detect Scops with only one loop.
>>>
>>> If a region does not have more than one loop, we do not identify it as
>>> a Scop in ScopDetection. The main optimizations Polly is currently performing
>>> (tiling, preparation for outer-loop vectorization and loop fusion) are unlikely
>>> to have a positive impact on individual loops. In some cases, Polly's run-time
>>> alias checks or conditional hoisting may still have a positive impact, but those
>>> are mostly enabling transformations which LLVM already performs for individual
>>> loops. As we do not focus on individual loops, we leave them untouched to not
>>> introduce compile time regressions and execution time noise. This results in
>>> good compile time reduction (oourafft: -73.99%, smg2000: -56.25%).
>>>
>>> Contributed-by: Pratik Bhatu <cs12b1010 at iith.ac.in>
>>>
>>> Reviewers: grosser
>>>
>>> Differential Revision: http://reviews.llvm.org/D12268
>>
>> This change significantly improves compile-time performance on LNT without
>> causing any execution-time regressions:
>>
>> clang -O3 -polly - Before vs. After:
>> 	
>> http://llvm.org/perf/db_default/v4/nts/38436
>>
>> clang -O3 vs. clang -O3 -polly (before):
>>
>> http://llvm.org/perf/db_default/v4/nts/38423?compare_to=38469
>>
>> clang -O3 vs. clang -O3 -polly (after):
>>
>> http://llvm.org/perf/db_default/v4/nts/38436?compare_to=38469
>>
>> We see 78 compile time improvements with 14 of them showing more than 25%
>> percent compile time reduction. The overall benchmarks which show an
>> increase in compile time due to Polly are reduced from 140 to 111.
> While this sounds good, I think we do miss too many SCoPs with this
> patch. Especially ones that have two or more affine loops but the region
> entry is just not part of them.

Yes, I also realized we block too many loops here. Polybench's 3mm kernel
is e.g. split into three kernels now. :(

>To this end I think we need to change
> the way we count loops. My proposal:
>
>    Instead of a boolean Context.hasAffineLoop we keep a integer that we
>    increase for each affine loop. In the end we do not check if
>    hasAffineLoop is true or false but the integer is bigger than 1.
>    This will catch all affine loops in the region.

> @Tobias, @Pratik:
>    Do you agree that we might miss SCoPs with multiple loops?
>    Do you think my proposal is OK or do you see any potential for
>    improvement?

Right. That would definitely work. However, we only have this number after
verifying that the full scop is indeed a scop. My hope would be that we
could bail out even before running scop-detection over the offending region.

Maybe by iterating over the outermost loops and checking if more of them
are contained in the region?

> @Pratik:
>    If we need to change this again, will you implment the new patch?

I leave this to Pratik?

Tobias