[PATCH] D61075: [CodeGenPrepare] delay instruction deletion for efficiency

Thu Apr 25 10:33:17 PDT 2019

spatel added a comment.

In D61075#1478122 <https://reviews.llvm.org/D61075#1478122>, @reames wrote:

> I think I'm missing something basic here.  CGP will iterate until done, including trivial DCE.  Given that, why not just leave the instructions in the IR and let the next iteration remove them?  It would trigger an extra iteration, which might be undesirable, but would it be incorrect?  That would seem better than invalidating the DomTree, which just forces a new iteration anyways...
>
> Actually, it doesn't look like it includes trivial DCE.  Maybe it should?
>
> Assuming the answer is that yet, that would be correct, then this patch becomes a smallish optimization over an extra iteration right?

Yes, that sounds right. If CGP included DCE, then it would be easier to just rely on that for cleanup. Unfortunately, it doesn't. I don't know the full history here, but I suspect that since CGP was intended to be a temporary hack (see opening comment in the source file), nobody has bothered to invest in doing better.

But since this patch didn't solve the compile-time problem anyway, I'm going to propose a bigger fix anyway...

> Thinking through the logic, we don't need to worry about the cmp instruction since the caller advanced the iterator beyond it.  So, the only concern is the binary operator.  But correct me if I'm wrong, don't you require the BO to dominate the Cmp?  If so, then haven't we already iterated past that instruction?  How do we have an iterator invalidation problem at all?

No, the binop does not necessarily dominate the compare if they are within a single block. If they are independent ops (no def-use relationship), then we don't know which order they are in within that block. We have regression tests with those patterns.

The history of these overflow transforms is that we wanted to do these as canonicalizations in instcombine, but that caused perf regressions. So then the add overflow transform got dumped here in CGP with a TLI hook to allow it. Then, I noticed that we weren't getting that transform consistently with some constant operands or the subtract overflow sibling. So I improved the pattern matching, and idealistically/naively tried to allow the transforms in all cross-block cases as long as the target said it was legal.

But that still caused perf regressions, so the transform was limited in 2 ways: (1) avoid hoisting the binop into a dominating cmp block and (2) avoid hoisting either op unless we had a direct predecessor/successor relationship.

At this point, I'm going to propose that we just give up on any cross-block attempts. I don't have any perf data to show that we got any wins from that, and I don't think anyone has the will to fix the backend problems and compile-time issues.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61075/new/

https://reviews.llvm.org/D61075