[PATCH] D65060: [LICM] Make Loop ICM profile aware

Thu Aug 15 00:39:53 PDT 2019

wenlei added a comment.

Thanks for the context, @chandlerc. Agreed that ideally BFI should be preserved by all loop passes. Not to defend this change, but I have a few questions just to make sure I understand the cause of miscompile/non-determinism and there's no other lurking issues around this.

> In the legacy pass manager, this almost works, but can result in wildly inconsistent behavior when half way through a loop pipeline BFI gets invalidated and/or where it can partition the loop pass pipeline in weird ways breaking intended iteration order of the loop pass pipeline.

With legacy pass manager, the partition of loop pass pipeline manifested just like the test change in opt-O2-pipeline.ll/opt-O3-pipeline.ll. I thought it's definitely not desired, but not a correctness issue either - it shouldn't leads to miscompile or non-determinism. The failures you observed are not from legacy pass manager, right?

With new pass manager, since getCachedResult is used to access BFI there, if BFI was invalidated before the beginning loop pass manager, it will be null, which makes the added heuristic a no-op. Otherwise BFI will always be 'available' even after loop transformations invalidate it, the problem as I see it is that invalidation happens on loop level for new pass manager, but BFI is function level analysis result, so the invalidation wouldn't actually work. Correct me if I'm wrong, from what I can tell this seems a purposeful design to 'force' preservation of higher level analysis result during loop pass pipeline, to avoid the partition situation of legacy pass manager.

However what I don't understand is how this leads to non-determinism or miscompile.

> I suspect this produces non-deterministic behavior based on what order we visit things causing BFI to be present or not when we happen to reach the loop pass manager, and happening to change the behavior.

IIUC, as long as the order we visit functions/loops is deterministic (think it is), whether BFI is available when we reach loop pass manager should also be deterministic. Or are you saying there're (benign) non-deterministic ordering today, and the fact that BFI availability happen to depend on that order made those benign internal non-determinism visible externally?

> Even worse, if BFI *happens* to be available at the start of the loop pass pipeline in the new pass manager, it will continue to be used even after passes restructure the basic blocks invalidating it. This can lead to arbitrarily bad behavior.

If BFI is available at the start of loop pass pipeline, since the invalidation on loop level effective doesn't work for function level BFI, it's as if we mark BFI as preserved without actually updating/preserving it, which is wrong. But looking at the APIs of BFI, for blocks that still exist after transformation, we would still get a count, could be stale though, and for blocks didn't exist before transformation, we would get 0. That's problematic in terms of count quality/accuracy, but all seems deterministic, not correctness issue either.

Updating/preserving BFI seems non-trivial as we may also need to update metadata so that our updates will be persisted even outside of the loop pass manager, and next invocation of BPI/BFI will still get the updated result. That said, it's the right thing to do as you said. My point is there may always some imperfection in the update, and not updating it at all is an extreme, but it's should be a matter of accuracy still. I hope the degree of accuracy isn't the cause of the issues we're seeing. It'd be great if you can share a reproducer - I'd like to take a closer look. Thanks.

Repository:
  rL LLVM

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D65060/new/

https://reviews.llvm.org/D65060