[PATCH] D102615: [LoopDeletion] Break backedge if we can prove that the loop is exited on 1st iteration (try 3)

Tue Jun 15 18:43:58 PDT 2021

reames added a comment.

I want to chime in on the philosophy bits raised in the last few comments.

@nikic I understand and generally agree with your point about not implementing adhoc optimizations for particular patterns which happen to break particular workloads.  However, I don't think it really applies here.

Leaving aside your point about whether this can be done without SCEV (a reasonable question I'll touch on later), I see the class of loops which provably don't execute their backedge based on symbolic reasoning of the first iteration as being an interesting class.  In fact, SCEV *already* does a variant of this in the exit computation logic when the exit condition is not analyzable.  If anything, from a philosophy perspective, I'd prefer that this patch didn't exist because SCEV did this internally.  However, that doesn't appear to be pragmatic from a compile time perspective.

I also want to call out a methodology issue.  If we use the test-suite as evidence of (lack of) profit, and the test suite contains only historical C and C++ code, we will never implement any transform which either a) shows up for other languages, or b) is idiomatic in newer code (e.g. new language versions for C/C++).  I'm not saying you're wrong to collect this data, but we need to be careful about how we interpret it.

>From a purely pragmatic perspective, I don't want to get in the habit of breaking forward progress for perfection.  @nikic, I see you as coming dangerously close to that here.  I do think that if Max can show a savings elsewhere, we should allow this patch.

@mkazantsev As a pragmatic means to address the raised review concern, have you investigated why using SCEV here is more expensive?  One thing I'm wondering about is whether we're invalidating SCEV too aggressively in one of the existing transforms.  I don't have strong evidence of this, but computing SCEVs for a loop isn't *that* expensive, and given a properly structured loop pipeline should happen approximately once.  I'd be tempted to instrument the various "forgetX" calls and run one of the worst offenders compile time wise.  Do all of the invalidation make sense?  Just based on a quick skim of the code, I see a couple call sites in LoopSimplifyCFG and IndVarSimplify which look likely to invalidate far more than is required.  In particular, if we were invalidating multiple times in a loop nest, that could cause degenerate compile time.

I also second @nikic request for a less reduced reproducer.  It's hard to tell whether there's another approach here without seeing the original motivating example.  The fact that all of the test cases are handled by SimplifyInst (which, btw, unrolling already has most of the code for) raises questions here.  It'd be nice to put those to bed once and for all.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D102615/new/

https://reviews.llvm.org/D102615