[PATCH] D115261: [LV] Disable runtime unrolling for vectorized loops.

Fri Dec 10 08:28:22 PST 2021

fhahn added a comment.

In D115261#3182932 <https://reviews.llvm.org/D115261#3182932>, @nikic wrote:

> In D115261#3177417 <https://reviews.llvm.org/D115261#3177417>, @lebedev.ri wrote:
>
>> Even if both of the unrollers are right as per their model
>> (LU duplicates whole loop body; while LU duplicates each instruction,
>> increasing live ranges, i believe), i'm mainly just worried
>> that two unroll strategies disagree in the end.
>>
>> Which one is actually right? LV?
>> Is there some analysis that can be extracted from LV that LU could use
>> to deduce better unroll factor? (which would be 1x (no further unroll) after LV)
>>
>> All that being said, i don't have any concrete examples that regress with this.
>
> Runtime loop unrolling doesn't really have anything that deserves the name of "cost model", at least if there is no profile data. It basically just unrolls the loop as many times as fits into a threshold. I don't know what kind of cost modelling LV does in this area, but I can only assume it's better than that ;)

LV at least tries to limit interleaving based on the number of execution units, so in that respect it should be a more realistic heuristic than the purely size-based on in the unroller. I guess one reason why the size based thresholds for unrolling are still in place is that one of the main benefits from aggressive unrolling in LLVM is increasing the context for later local optimizations. This point shouldn't really apply for vectorized loops in most cases.

One interesting point that @lebedev.ri is that in some cases interleaving by LV won't happen due to it causing spills of vector registers, whereas this isn't a problem with runtime-unrolling. But I'd assume in practice such loops should already be 'large enough'.

> I believe many targets already disable runtime unrolling for loops that contain vector instructions. For example AArch64 does that, though X86 currently does not. This is the principal alternative I would see, to move that logic up into the generic unroll preferences. It would be the difference between not unrolling loops that LLVM vectorized and not unrolling vector loops in general -- I assume the preference would be the former, as this patch does?

IIRC AArch64 only enables runtime unrolling for in-order targets at the moment.  Disabling unrolling for loops with vector instructions in general seems like a workaround.

In D115261#3185543 <https://reviews.llvm.org/D115261#3185543>, @lebedev.ri wrote:

> I would still want to see //some// numbers from a run on an affected arch/cpu, where there previously would be unrolling and now there won't be.
> Lack of change will be great, presence will mainly be a canary test only, not a blocker.

Agreed & fair point! I run SPEC2006 on X86 and the only notable change is `sphinx3`, but interestingly enough in the function with the biggest runtime difference there are no codegen changes. Still taking a closer look, but it may be down to code alignment changes.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115261/new/

https://reviews.llvm.org/D115261