[llvm-dev] How to best deal with undesirable Induction Variable Simplification?

Danila Malyutin via llvm-dev llvm-dev at lists.llvm.org
Fri Aug 9 05:32:21 PDT 2019


> Since r139579, IndVarSimplify (the pass) should not normalize induction variables without a reason anymore (a reason would be that the loop can be deleted). Could you file a bug report, attach a minimal .ll file and mention what output you would expect?

The IV is removed there by the replaceCongruentIVs. It is what I'd probably expect when looking at the IR alone, but, as I've mentioned, this prevents latency masking later down the line since now certain ops use single common register.

> Since after unswitching only one of the resulting loops is executed, the register usage should be the maximum of those loops, which ideally is at most the register usage of the pre-unswitched loop. In your case, p could be in the same register in all unswitched loops.
However, other optimizations might increase register pressure again and the register allocation is not optimal in all cases.

It looks like for some reason, when IndVars rewrote all loop exit values (which were just pointers incremented in the loop body) from simple single-value phis to GEP with recomputed offset (back edge count * increment inside the loop), it expanded this offset computation in the main outermost loop (pre?)header even when the value was used only inside one of the unswitched loops exits. Later passes failed to sink them either for whatever reason so in the end instead of max(unswitched loop regs) it became max(unswitched loop regs) + Const * number of loops (for offsets, even though many were shared). 

I'll see if I can come up with a minimal reproducer for some in-tree target.

--
Danila

-----Original Message-----
From: Michael Kruse [mailto:llvmdev at meinersbur.de] 
Sent: Friday, August 9, 2019 02:22
To: Danila Malyutin <Danila.Malyutin at synopsys.com>
Cc: llvm-dev at lists.llvm.org
Subject: Re: [llvm-dev] How to best deal with undesirable Induction Variable Simplification?

Am Do., 8. Aug. 2019 um 12:37 Uhr schrieb Danila Malyutin via llvm-dev
<llvm-dev at lists.llvm.org>:
>
> Hello,
> Recently I’ve come across two instances where Induction Variable Simplification lead to noticable performance regressions.
>
> In one case, the removal of extra IV lead to the inability to reschedule instructions in a tight loop to reduce stalls. In that case, there were enough registers to spare, so using extra register for extra induction variable was preferable since it reduced dependencies in the loop.

Since r139579, IndVarSimplify (the pass) should not normalize induction variables without a reason anymore (a reason would be that the loop can be deleted). Could you file a bug report, attach a minimal .ll file and mention what output you would expect?



> Due to unswitching there were several such loops each with the different number of p+=n ops, so when the IndVars pass rewrote all exit values, it added a lot of slightly different offsets to the main loop header that couldn’t fit in the available registers which lead to unnecessary spills/reloads.

Since after unswitching only one of the resulting loops is executed, the register usage should be the maximum of those loops, which ideally is at most the register usage of the pre-unswitched loop. In your case, p could be in the same register in all unswitched loops.
However, other optimizations might increase register pressure again and the register allocation is not optimal in all cases.

Again, could you file a bug report, include a minimal reproducer and what output you expect?


> I am wondering what is the usual strategy for dealing with such “pessimizations”? Is it possible to somehow modify the IndVarSimplify pass to take those issues into account (for example, tell it that adding offset computation + gep is potentially more expensive than simply reusing last var from the loop) or should it be recovered in some later pass? If so, is there an easy way to revert IV elimination? Have anyone dealt with similar issues before?

Ideally, we prefer to such pessimizations to not occur, as r139579 did. However, the transformation might also be a IR normalization that enables other transformations. In that case, another pass down the pipeline would transform the normalized form to an optimized one. For instance, LoopSimplify inserts a loop preheader the CFGSimplify would remove again. What is considered normalization depends on the case. If you can show that a change generally improves performance (not just for your code) and has at most minor regressions, then any approach is worth considering.

Michael


More information about the llvm-dev mailing list