[llvm-dev] Loop Distribution pass

Wed Sep 19 10:26:09 PDT 2018

> On Sep 13, 2018, at 1:21 AM, Jonas Paulsson <paulsson at linux.vnet.ibm.com> wrote:
> 
> Hi,
> 
> I found with the help of the optimization remarks a loop that could not be vectorized, but if loop distribution was enabled this may happen, which it in fact did with a very significant benchmark improvement (~25%).
> 
> I tried (on SystemZ) to enable this pass, and found that it only affected a handful of files on SPEC. This means I could enable this without worrying about any regressions on SystemZ at least currently.
> 
> I wonder if there is something more to know about this. It seems that no other target has enabled this due to general mixed results, or? Is this triggering much more on other targets, and if so, why?

The main thing that is missing from the pass right now is a serious analysis of profitability as it affects instruction- and memory-level parallelism.   The easiest to see this that LD is a reverse transformation of Loop fusion so where LF helps LD may regress.  MLP is the big one in my opinion which would totally reverse any gains from vectorization. 

We would probably have to do similar things to the SW prefetch insertion pass in order to analyze accesses that are likely to be skipped by the HW prefetcher.  Needless to say this is a very micro-architecture specific analysis/cost model.  If we can establish that ILP/MPL is unaffected even in simplest cases and vectorization is enabled we could enable the transformation by default (in addition to the pragma-driven approach  we have now).

Adam

> 
> /Jonas
> 
>