[llvm-dev] Loop Distribution pass

Thu Sep 20 14:59:24 PDT 2018

Hi,

On 20/09/2018 17:11, Jonas Paulsson via llvm-dev wrote:
> Hi Adam,
> 
> 
> On 2018-09-19 19:26, Adam Nemet wrote:
>>
>>> On Sep 13, 2018, at 1:21 AM, Jonas Paulsson 
>>> <paulsson at linux.vnet.ibm.com> wrote:
>>>
>>> Hi,
>>>
>>> I found with the help of the optimization remarks a loop that could 
>>> not be vectorized, but if loop distribution was enabled this may 
>>> happen, which it in fact did with a very significant benchmark 
>>> improvement (~25%).
>>>
>>> I tried (on SystemZ) to enable this pass, and found that it only 
>>> affected a handful of files on SPEC. This means I could enable this 
>>> without worrying about any regressions on SystemZ at least currently.
>>>
>>> I wonder if there is something more to know about this. It seems that 
>>> no other target has enabled this due to general mixed results, or? Is 
>>> this triggering much more on other targets, and if so, why?
>> The main thing that is missing from the pass right now is a serious 
>> analysis of profitability as it affects instruction- and memory-level 
>> parallelism.   The easiest to see this that LD is a reverse 
>> transformation of Loop fusion so where LF helps LD may regress.  MLP 
>> is the big one in my opinion which would totally reverse any gains 
>> from vectorization.
>>
>> We would probably have to do similar things to the SW prefetch 
>> insertion pass in order to analyze accesses that are likely to be 
>> skipped by the HW prefetcher.  Needless to say this is a very 
>> micro-architecture specific analysis/cost model.  If we can establish 
>> that ILP/MPL is unaffected even in simplest cases and vectorization is 
>> enabled we could enable the transformation by default (in addition to 
>> the pragma-driven approach  we have now).
> Thanks for the reply.
> 
> Since this is today extremely conservative and nearly never triggers, at 
> least on SystemZ, while still being very beneficial when it does happen, 
> it seems that this could be used as-is now on SystemZ with a new TTI 
> hook to enable it selectively per target.
> 
> The question now is if this is a wise idea? Do you think things will 
> change significantly with the Loop Distribution pass in the direction 
> that it gets much more enabled, which may then cause regressions on 
> SystemZ? If that is the case, perhaps the idea now is that nobody 
> activates it per default until some initial reasonable cost modeling has 
> been made?
> 

I think the loop interchange pass is in a similar situation: it gives 
substantial speedup on a few benchmarks without regressions (at least 
once the patch to turn it into a loop pass lands and for the benchmarks 
I run). It would definitely benefit from having a better way to check if 
we can vectorize if we would interchange loops too.

Cheers,
Florian