[PATCH] D65060: [LICM] Make Loop ICM profile aware

Thu Aug 15 12:20:29 PDT 2019

asbirlea added a comment.

Let me try to answer some of the question, and my apologies for not catching this in the initial review.

> With legacy pass manager, the partition of loop pass pipeline manifested just like the test change in opt-O2-pipeline.ll/opt-O3-pipeline.ll. I thought it's definitely not desired, but not a correctness issue either - it shouldn't leads to miscompile or non-determinism. The failures you observed are not from legacy pass manager, right?

That's correct. For the legacy pass manager the consequence of introducing this dependency is splitting the loop pass pipeline. The failures observed were in the new pass manager.

> With new pass manager, since getCachedResult is used to access BFI there, if BFI was invalidated before the beginning loop pass manager, it will be null, which makes the added heuristic a no-op. Otherwise BFI will always be 'available' even after loop transformations invalidate it, the problem as I see it is that invalidation happens on loop level for new pass manager, but BFI is function level analysis result, so the invalidation wouldn't actually work. Correct me if I'm wrong, from what I can tell this seems a purposeful design to 'force' preservation of higher level analysis result during loop pass pipeline, to avoid the partition situation of legacy pass manager.

That's correct, the pass is invalidated in the middle of the loop pass pipeline, but since invalidation only happens at the loop pipeline level, it's not actually invalidated after each loop pass. So there are loop passes who will use garbage data from the BFI.

> However what I don't understand is how this leads to non-determinism or miscompile.

BFI keeps in its internal data BasicBlock*-s. We'll have two passes, one deleting BBs (LoopSimplify or SimplifyCFG) and one creating new BBs (simple loop unswitch I think it was). The rough idea from what we've seen is that, once invalidated the BasicBlock pointers can be anything.  They can be invalid causing crashes, but they can also be valid pointers pointing to newly created BasicBlocks by that second loop pass, blocks that are naturally different from the original ones BFI queried. Hence, based on where the new BBs are allocated, you'll get non-deterministic results on what the BFI query replies. 
This is what made the problem very hard to pinpoint to the cause.

> If BFI is available at the start of loop pass pipeline, since the invalidation on loop level effective doesn't work for function level BFI, it's as if we mark BFI as preserved without actually updating/preserving it, which is wrong. But looking at the APIs of BFI, for blocks that still exist after transformation, we would still get a count, could be stale though, and for blocks didn't exist before transformation, we would get 0. That's problematic in terms of count quality/accuracy, but all seems deterministic, not correctness issue either.

You're right that it's as if BFI is preserved without it actually being preserved. This is a big pain point in the new pass manager, which allows such mistakes.
I'm going to try to work on a refactoring for the getCachedAnalysis API in order to be able to catch such cases. The rough idea is that this API should only be allowed to get analyses that cannot be invalidated, while the analyses that *can* be invalidated but we "promise" to preserve throughout the loop pass pipeline, should go in the AnalysisResults.

> Updating/preserving BFI seems non-trivial as we may also need to update metadata so that our updates will be persisted even outside of the loop pass manager, and next invocation of BPI/BFI will still get the updated result. That said, it's the right thing to do as you said. My point is there may always some imperfection in the update, and not updating it at all is an extreme, but it's should be a matter of accuracy still. I hope the degree of accuracy isn't the cause of the issues we're seeing. It'd be great if you can share a reproducer - I'd like to take a closer look. Thanks.

I think that the most basic update will be to remove all invalid data from the BFI internal data structures. Then, yes, it becomes a matter of accuracy. I'm deferring to chanderc@ if a reproducer can be made available, but I think the BFI cleanup update would give you the answer to whether the improvements you were seeing were genuine.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65060/new/

https://reviews.llvm.org/D65060