[all-commits] [llvm/llvm-project] 905083: [LTO] Ensure LICM hoists expensive fdiv instructio...

Fri Jul 7 04:06:43 PDT 2023

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 905083f3c1a6c8dada42ad1c68553fa639f22859
      https://github.com/llvm/llvm-project/commit/905083f3c1a6c8dada42ad1c68553fa639f22859
  Author: David Sherwood <david.sherwood at arm.com>
  Date:   2023-07-07 (Fri, 07 Jul 2023)

  Changed paths:
    M llvm/lib/Passes/PassBuilderPipelines.cpp
    M llvm/test/Other/new-pm-lto-defaults.ll
    M llvm/test/Transforms/PhaseOrdering/lto-licm.ll

  Log Message:
  -----------
  [LTO] Ensure LICM hoists expensive fdiv instructions introduced by InstCombine

In the LTO pipeline we run InstCombine after LICM, which is
different to what we normally do without LTO. This has the
effect of undoing all the great work done by LICM to reduce
the cost of the loop when it hoists the fdiv out and replaces
it with fmul. When InstCombine runs after LICM it puts the
fdiv straight back which, on AArch64 at least, is darn
expensive. You can observe this problem in the SPEC2017
benchmark parest if you build with "-Ofast -flto" and the
loop-vectoriser uses an unroll factor of 1, which is what
often happens when tail-folding is enabled.

This is also a problem for scalar loops, or indeed any loop
where there is only one use of the preheader fdiv result in
the loop.

See InstCombinerImpl::visitFMul for the code that sinks the fdiv.

I've attempted to fix this by adding another LICM pass for Full
LTO after InstCombine. The alternative is to stop InstCombine
from sinking the fdiv into loops. See D87479 for a previous
discussion on this issue.

Differential Revision: https://reviews.llvm.org/D143631