[PATCH] D84108: [SimplifyCFG][LoopRotate] SimplifyCFG: disable common instruction hoisting by default, enable late in pipeline

Sat Jul 25 11:04:51 PDT 2020

nikic added a comment.

Some thoughts:

- Ultimately memdep analysis spends most time in alias analysis, and we can make that part faster but using BatchAA. This has a non-trivial positive impact in general: http://llvm-compile-time-tracker.com/compare.php?from=bc79ed7e16003c8550da8710b321d6d5d4243faf&to=1ab8f1b475d2b9d464add10d1a7f87f18c073fb0&stat=instructions For this particular test case it drops instructions from ~4.2M to ~3.8M.
- I think fundamentally, what memdep does and the way it does it, cannot be easily improved. This is probably something that MemorySSA-based NewGVN improves on (at least in theory -- in practice MemorySSA has a not so stellar compile-time record).
- What can be improved though are the cutoffs. Non-local memdep analysis has two primary cutoffs: The BlockScanLimit=100, which limits the amount of instructions inspected in a single block, and the BlockNumberLimit=1000, which limits the amount of inspected blocks. Taken together, this means that a **single** memdep query can inspect up to 100000 instructions, which is frankly insane. A quick test with -memdep-block-number-limit=100 drops the instructions for the test case to ~1.8M. It may be reasonable to either reduce the default block limit here, or to introduce an additional "total instruction limit" for non-local queries.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D84108/new/

https://reviews.llvm.org/D84108