[llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

Sun Oct 14 22:54:15 PDT 2018

I'd like to give some input on that as well.

From: llvm-dev [mailto:llvm-dev-bounces at lists.llvm.org] On Behalf Of Chris Lattner via llvm-dev
Sent: Wednesday, October 3, 2018 5:17 AM
To: Reid Kleckner <rnk at google.com>
Cc: llvm-dev <llvm-dev at lists.llvm.org>
Subject: Re: [llvm-dev] RFC Storing BB order in llvm::Instruction for faster local dominance

On Oct 1, 2018, at 6:30 PM, Reid Kleckner <rnk at google.com<mailto:rnk at google.com>> wrote:
memcpyopt isn’t/shouldn’t be querying dominance of random instructions within a block, it is finding memcpy’s by scanning scalar def-use chains and trying to back patch relationships between them.  You’d use a fumdamentally different approach with MemorySSA, where  just walk the code, find memcpy’s, and ask what they depend on.

This is all sparse, and doesn’t require dominance calls.  InstCombine doing conceptually similar things, and because it is based on use/def chains it doesn’t need to do any of this: the dominance relationship is implicit.

Maybe. I didn't raise this issue to become an expert in DSE or memcpyopt, and unfortunately I don't have time to dig in and really understand it. I'm just trying to do some basic performance engineering.

Acknowledged, but your proposal is to address one N^2 behavior by introducing a different one, and slowing down the common case by a constant factor.  You are arguing that this tradeoff is worth it, but not providing data.  The problems you are observing have multiple possible solutions, and we’re all searching for the best fit.

If I understood the scenario that produces N^2 with this patch correctly, provided that instruction's removal is free, the only potentially bad case is when we repeatingly make a local dominance checks and instruction insertions to a very same block. Though it is theoretically possible that someone will do it, I don't really think that some pass is actually doing that. And even if it is, it doesn't look like a big deal to rewrite it in a way that it does all dominance queries in advance and then does the insertion.

So I wouldn't be much worried about this situation. We can completely rule it out if it ever shows up. We can easily detect it by adding statistics about number of invalidations and queries to a block, and if their ratio becomes close to 1, throw assertion (under some option ofc).  BTW, having this statistic could be useful in general to see how efficient the caching is. If this patch helps eliminate an *actually existing* N^2 (it's really the case according to data in the commit message) and introduces N^2 that doesn't seem to be a real case and can be easily detected and fixed, I find it reasonable to use this patch's approach.

What I know for sure is that some passes are currently struggling with the situations when having been able to answer these dominance queries easily would save us from much pain. These are such fundamental passes as GVN (which uses ImplicitControlFlowTracking for PRE, and it currently needs to do a lot of manual invalidation to OI, and sometimes this may be redundant). Another pass that suffers from same problem as GVN is LICM: it is ultra-conservative when it comes to having a loop with potentially throwing instructions (basically, any loop with calls inside). Due to limitations of LoopSafetyInfo, LICM abstains from doing most transforms to such loops. These are missed optimization opportunities.

I tried adding OI along with some other stuff into the LoopSafetyInfo, and I found out that in this case it becomes passes' responsibility to invalidate OI correctly, and the number of places where it should be done is huge, and it is easy to skip one. Due to dangers of such approach, it just becomes non-workable. This patch is a real help for this situation: when the invalidation is automatic, there is much less room for making a mistake (and I think everyone here is agreed that automation of updates is cool :)).

So I see this patch as a powerful way to enable more optimizations we are currently missing without making an extra room for errors and sparing them of redundant work. Maybe collecting stats for ratios of queries/recalculation for blocks could give us more data.

-- Max

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20181015/61725f56/attachment.html>