[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance
Chris Lattner via llvm-dev
llvm-dev at lists.llvm.org
Tue Oct 2 15:16:32 PDT 2018
> On Oct 1, 2018, at 6:30 PM, Reid Kleckner <rnk at google.com> wrote:
> memcpyopt isn’t/shouldn’t be querying dominance of random instructions within a block, it is finding memcpy’s by scanning scalar def-use chains and trying to back patch relationships between them. You’d use a fumdamentally different approach with MemorySSA, where just walk the code, find memcpy’s, and ask what they depend on.
>
> This is all sparse, and doesn’t require dominance calls. InstCombine doing conceptually similar things, and because it is based on use/def chains it doesn’t need to do any of this: the dominance relationship is implicit.
>
> Maybe. I didn't raise this issue to become an expert in DSE or memcpyopt, and unfortunately I don't have time to dig in and really understand it. I'm just trying to do some basic performance engineering.
Acknowledged, but your proposal is to address one N^2 behavior by introducing a different one, and slowing down the common case by a constant factor. You are arguing that this tradeoff is worth it, but not providing data. The problems you are observing have multiple possible solutions, and we’re all searching for the best fit.
To reiterate my position, I’m not saying you or your patch is necessarily *wrong*. I’m saying that it is invasive, and invasive patches require careful consideration. I’m just asking for that consideration and tradeoff analysis.
>> MemorySSA only helps if you're fine with skipping non-aliasing accesses. It's not clear to me that this is always the case. For example, imagine that we're trying to do something like SLP vectorization, and so we follow the use-def chain of an address calculation and find two adjacent, but non-aliasing, loads in the same basic block.
>
> To be clear, I was only referring to the DSE/memcpyopt cases which appear to be the primary motivators for this line of work. Specific transformations with specific needs can continued to use OrderedBB if beneficial.
>
> I think the point is that we already know we have lots of users of OrderedBasicBlock, and OrderedInstructions, and are likely to grow more as time goes on. With the current design, every user of OrderedBasicBlock has to take responsibility for invalidating the analysis as they add and remove instructions, introducing bugs, and fixing them as each transform develops independently. If we instead make the core data structures efficiently expose facts that they already know about themselves, everyone wins.
>
> I think one of the most compelling use cases for this information that we're likely to want to use more of as time goes on is ImplicitControlFlowTracking. This analysis uses OrderedInstructions to figure out if a given instruction is truly "must execute", i.e. if we reach this basic block, we can assume it executes. You can use it to turn LLVM's extended, not-quite-basic blocks into basic blocks, where you can assume that every instruction in the block executes, or every instruction in the block post-dominates instructions that come before it.
Of course, basic blocks in general are a pointless abstraction if you don’t care about efficiency. If efficiency didn’t matter, we’d treat every instruction as a “Basic block” and model dominance and everything on top of that. The only reason we accept the cost of basic blocks in the core IR is that the size of the dominator tree (as one simple example) matters.
-Chris
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181002/42e85ea6/attachment.html>
More information about the llvm-dev
mailing list