[LLVMdev] IR Passes and TargetTransformInfo: Straw Man

Tue Jul 16 21:38:24 PDT 2013

Since introducing the new TargetTransformInfo analysis, there has been some confusion over the role of target heuristics in IR passes. A few patches have led to interesting discussions.

To centralize the discussion, until we get some documentation and better APIs in place, let me throw out an oversimplified Straw Man for a new pass pipline. It serves two purposes: (1) an overdue reorganization of the pass pipeline (2) a formalization of the role of TargetTransformInfo.

---
Canonicalization passes are designed to normalize the IR in order to expose opportunities to subsequent machine independent passes. This simplifies writing machine independent optimizations and improves the quality of the compiler.

An important property of these passes is that they are repeatable. The may be invoked multiple times after inlining and should converge to a canonical form. They should not destructively transform the IR in a way that defeats subsequent analysis.

Canonicalization passes can make use of data layout and are affected by ABI, but are otherwise target independent. Adding target specific hooks to these passes can defeat the purpose of canonical IR.

IR Canonicalization Pipeline:

Function Passes {
  SimplifyCFG
  SROA-1
  EarlyCSE
}
Call-Graph SCC Passes {
  Inline
  Function Passes {
    EarlyCSE
    SimplifyCFG
    InstCombine
    Early Loop Opts {
      LoopSimplify
      Rotate (when obvious)
      Full-Unroll (when obvious)
    }
    SROA-2
    InstCombine
    GVN
    Reassociate
    Generic Loop Opts {
      LICM (Rotate on-demand)
      Unswitch
    }
    SCCP
    InstCombine
    JumpThreading
    CorrelatedValuePropagation
    AggressiveDCE
  }
}

IR optimizations that require target information or destructively modify the IR can run in a separate pipeline. This helps make a more a clean distinction between passes that may and may not use TargetTransformInfo.

TargetTransformInfo encapsultes legal types and operation costs. IR instruction costs are approximate and relative. They do not represent def-use latencies nor do they distinguish between latency and cpu resources requirements--that level of machine modeling needs to be done in MI passes.

IR Lowering Pipeline:

Function Passes {
  Target SimplifyCFG (OptimizeCFG?)
  Target InstCombine (InstOptimize?)
  Target Loop Opts {
    SCEV
    IndvarSimplify (mainly sxt/zxt elimination)
    Vectorize/Unroll
    LSR (move LFTR here too)
  }
  SLP Vectorize
  LowerSwitch
  CodeGenPrepare
}
---

The above pass ordering is roughly something I think we can live with. Notice that I have:
  Full-Unroll -> SROA-2 -> GVN -> Loop-Opts
since that solves some issues we have today.

I don't currently have any reason to reorder the "late" IR optimization passes (those after generic loop opts). We do either need a GVN-util that  loops opts and lowering passes may call on-demand after performing code motion, or we can rerun a non-iterative GVN-lite as a cleanup after lowering passes.

If anyone can think of important dependencies between IR passes, this would be good time to point it out.

We could probably make an adjustment to the ‘opt' driver so that the user can specify any mix of canonical and lowering passes. The first lowering pass and subsequent passes would run in the lowering function pass manager.

‘llc' could also optionally run the lowering pass pipeline for as convenience for users who want to run ‘opt' without specifying a triple/cpu.

-Andy