[PATCH] D91481: [LoopUnroll] Discount uniform instructions in cost models

Sat Nov 14 10:09:42 PST 2020

reames created this revision.
reames added reviewers: samparker, dfukalov, Whitney, anna, skatkov.
Herald added subscribers: dantrushin, javed.absar, zzheng, bollu, hiraditya, mcrosier.
Herald added a project: LLVM.
reames requested review of this revision.

This patch updates the cost models for loop unrolling to discount the cost of a loop invariant expression on all but one iteration.  The reasoning here is that such an expression (as determined by SCEV) will be CSEd once the loop is unrolled.  Note that SCEVs reasoning will find things which can be invariant, not simply those outside the loop.

My actual motivation for this line of work is items 1 and 2 from the added TODO.  I'm looking to teach the optimizer how to better model expressions which are uniform across some number of iterations of a loop (but not all).  I'm working on the vectorizer in parallel, and mostly started on this code to see if it made an easier path for testing out the new logic I plan to add.

Note that unrolling currently has two cost models.  One very very simple for partial and runtime unrolling, one much more complicated for full unrolling.  This patch updates both, but does not attempt to unify them.

Reviewers, I am not entirely sure this is the right approach.  The obvious alternative is to generalize the full unrolling cost model to handle partial unrolling.  As detailed below, the alternative looks non-trivial, and I decided against it.  I do see the argument for that being the right approach though, so if you want to just reject this patch outright, I won't argue.

Unification appears tricky.  The existing code for the full unroll costing makes two key assumptions we'd have to tweak.  First, it evaluates each iteration as opposed to reasoning about sets of iterations.  Second, it evaluates for a particular concrete unroll factor.  The first is invasive to change, and the second will likely require non-trivial caching to adapt to the partial unroll heuristics needs which involve costing multiple potential unroll factors.

Beyond the implementation complexity, it would be a major change to the cost modeling, and would likely trigger a need to retune all the thresholds.  I'm not currently able to do that retuning due to a lack of suitable benchmarking infrastructure.

Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D91481

Files:
  llvm/lib/Analysis/LoopUnrollAnalyzer.cpp
  llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp
  llvm/test/Transforms/LoopUnroll/X86/partial-uniform.ll
  llvm/test/Transforms/LoopUnroll/nonlatchcondbr.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D91481.305320.patch
Type: text/x-patch
Size: 13210 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20201114/bd48b965/attachment.bin>