[llvm-dev] Loop Distribution pass
Florian Hahn via llvm-dev
llvm-dev at lists.llvm.org
Thu Sep 20 14:59:24 PDT 2018
Hi,
On 20/09/2018 17:11, Jonas Paulsson via llvm-dev wrote:
> Hi Adam,
>
>
> On 2018-09-19 19:26, Adam Nemet wrote:
>>
>>> On Sep 13, 2018, at 1:21 AM, Jonas Paulsson
>>> <paulsson at linux.vnet.ibm.com> wrote:
>>>
>>> Hi,
>>>
>>> I found with the help of the optimization remarks a loop that could
>>> not be vectorized, but if loop distribution was enabled this may
>>> happen, which it in fact did with a very significant benchmark
>>> improvement (~25%).
>>>
>>> I tried (on SystemZ) to enable this pass, and found that it only
>>> affected a handful of files on SPEC. This means I could enable this
>>> without worrying about any regressions on SystemZ at least currently.
>>>
>>> I wonder if there is something more to know about this. It seems that
>>> no other target has enabled this due to general mixed results, or? Is
>>> this triggering much more on other targets, and if so, why?
>> The main thing that is missing from the pass right now is a serious
>> analysis of profitability as it affects instruction- and memory-level
>> parallelism. The easiest to see this that LD is a reverse
>> transformation of Loop fusion so where LF helps LD may regress. MLP
>> is the big one in my opinion which would totally reverse any gains
>> from vectorization.
>>
>> We would probably have to do similar things to the SW prefetch
>> insertion pass in order to analyze accesses that are likely to be
>> skipped by the HW prefetcher. Needless to say this is a very
>> micro-architecture specific analysis/cost model. If we can establish
>> that ILP/MPL is unaffected even in simplest cases and vectorization is
>> enabled we could enable the transformation by default (in addition to
>> the pragma-driven approach we have now).
> Thanks for the reply.
>
> Since this is today extremely conservative and nearly never triggers, at
> least on SystemZ, while still being very beneficial when it does happen,
> it seems that this could be used as-is now on SystemZ with a new TTI
> hook to enable it selectively per target.
>
> The question now is if this is a wise idea? Do you think things will
> change significantly with the Loop Distribution pass in the direction
> that it gets much more enabled, which may then cause regressions on
> SystemZ? If that is the case, perhaps the idea now is that nobody
> activates it per default until some initial reasonable cost modeling has
> been made?
>
I think the loop interchange pass is in a similar situation: it gives
substantial speedup on a few benchmarks without regressions (at least
once the patch to turn it into a loop pass lands and for the benchmarks
I run). It would definitely benefit from having a better way to check if
we can vectorize if we would interchange loops too.
Cheers,
Florian
More information about the llvm-dev
mailing list