[PATCH] D104870: [SimplifyCFG] Tail-merging all blocks with `unreachable` terminator

Wed Jun 30 10:50:46 PDT 2021

lebedev.ri added inline comments.

================
Comment at: llvm/test/Transforms/SimplifyCFG/tail-merge-noreturn.ll:129-130
 ; CHECK:       a2:
 ; CHECK-NEXT:    call void @assert_fail_1_alt(i32 0)
 ; CHECK-NEXT:    unreachable
 ;
----------------
lebedev.ri wrote:
> lebedev.ri wrote:
> > nikic wrote:
> > > lebedev.ri wrote:
> > > > aeubanks wrote:
> > > > > lebedev.ri wrote:
> > > > > > rnk wrote:
> > > > > > > lebedev.ri wrote:
> > > > > > > > rnk wrote:
> > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > rnk wrote:
> > > > > > > > > > > lebedev.ri wrote:
> > > > > > > > > > > > rnk wrote:
> > > > > > > > > > > > > I expected your code to fire on this test case. Can you explain why this example isn't getting tail merged?
> > > > > > > > > > > > > 
> > > > > > > > > > > > > Consider this example: https://gcc.godbolt.org/z/ox16a9P1z
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > [[noreturn]] void abort1();
> > > > > > > > > > > > > [[noreturn]] void abort2();
> > > > > > > > > > > > > [[noreturn]] void abort3();
> > > > > > > > > > > > > bool cond();
> > > > > > > > > > > > > void doAsserts() {
> > > > > > > > > > > > >     if (cond()) abort1();
> > > > > > > > > > > > >     if (cond()) abort2();
> > > > > > > > > > > > >     if (cond()) abort3();
> > > > > > > > > > > > > }
> > > > > > > > > > > > > ```
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I think it is more canonical to leave these unreachable terminators in place after the calls to noreturn functions, rather than merging the unreachables together.
> > > > > > > > > > > > > 
> > > > > > > > > > > > > I just want to make sure your transform isn't firing, creating BBs, and then a later part of simplifycfg rolls the unreachables back up into place after the calls.
> > > > > > > > > > > > > I expected your code to fire on this test case. Can you explain why this example isn't getting tail merged?
> > > > > > > > > > > > 
> > > > > > > > > > > > It fired, we didn't sink anything, and `SimplifyCFGOpt::simplifyUnreachable()` decided to undo it.
> > > > > > > > > > > Got it, and we want to avoid that because otherwise it will make the overall pass return true to indicate that it changed something, which will make the parent pass manager re-run more passes.
> > > > > > > > > > True.
> > > > > > > > > > 
> > > > > > > > > > As far as i'm aware that only results in potentially invalidating analysises,
> > > > > > > > > > i'm not aware of that triggering another optimization pass runs.
> > > > > > > > > > 
> > > > > > > > > > IIRC that part of `SimplifyCFGOpt::simplifyUnreachable()` is a pretty important canonicalization,
> > > > > > > > > > because e.g. instcombine can't modify cfg.
> > > > > > > > > Even if it only invalidates analyses, I think this is worth addressing before landing this. Ideally this code would directly call the heuristic that "sink from common predecessors" uses, but if that isn't available, I think you could approximate it by not merging unreachable terminators when the previous non-debug instruction is a noreturn call with distinct callees. We know that is unprofitable, and that accounts for most blocks ending in unreachable. It saves compile time from IR churn too.
> > > > > > > > > 
> > > > > > > > > @aeubanks, what are the consequences of passes indicating that they changed the IR when they actually didn't?
> > > > > > > > This is impossible to address.
> > > > > > > > 
> > > > > > > > > I think you could approximate it by not merging unreachable terminators when the previous non-debug instruction is a noreturn call with distinct callees.
> > > > > > > > 
> > > > > > > > I can not, because fixing lack of sinking in such cases is basically the very next step here.
> > > > > > > > I can not, because fixing lack of sinking in such cases is basically the very next step here.
> > > > > > > 
> > > > > > > This is the transform I'm talking about avoiding, and I don't think we plan to do this in the next step:
> > > > > > > ```
> > > > > > > bb1:
> > > > > > >   call void @abort1()
> > > > > > >   unreachable
> > > > > > > bb2:
> > > > > > >   call void @abort2()
> > > > > > >   unreachable
> > > > > > > ```
> > > > > > > ->
> > > > > > > ```
> > > > > > > bb1:
> > > > > > >   br label %common
> > > > > > > bb2:
> > > > > > >   br label %common
> > > > > > > common:
> > > > > > >   %callee = phi ... @abort1 ... @abort2
> > > > > > >   call void ... %callee()
> > > > > > >   unreachable
> > > > > > > ```
> > > > > > > Right? This would make direct calls indirect, which is less canonical. I doubt this is going to change soon.
> > > > > > > 
> > > > > > > > This is impossible to address.
> > > > > > > 
> > > > > > > I guess what you are saying is that this isn't possible to implement with the current code and data structures. We'd need to incorporate the structure of the instruction before unreachable into the map. This makes me think maybe it would be better to extend the SinkFromCommonPredecessors logic to consider blocks ending in unreachable. The current code is essentially restructuring the CFG in a way that is convenient for that function. I think transforms should avoid changing the IR before they know if transformation is really profitable, and it seems like the profitability heuristic is over there.
> > > > > > > Right? This would make direct calls indirect, which is less canonical. I doubt this is going to change soon.
> > > > > > 
> > > > > > Right. This isn't going to change.
> > > > > > However, consider
> > > > > > ```
> > > > > > bb1:
> > > > > >   call void @abort1()
> > > > > >   unreachable
> > > > > > bb2:
> > > > > >   call void @abort2()
> > > > > >   unreachable
> > > > > > bb3:
> > > > > >   call void @abort2()
> > > > > >   unreachable
> > > > > > ```
> > > > > > ->
> > > > > > ```
> > > > > > bb1:
> > > > > >   call void @abort1()
> > > > > >   unreachable
> > > > > > bb2:
> > > > > >   br label %bb2.bb3.common
> > > > > > bb2.bb3.common:
> > > > > >   call void @abort2()
> > > > > >   unreachable
> > > > > > ```
> > > > > > 
> > > > > > Also, consider:
> > > > > > ```
> > > > > > bb1:
> > > > > >   call void @abort1()
> > > > > >   unreachable
> > > > > > bb2:
> > > > > >   call void @abort1()
> > > > > >   br label %bb3
> > > > > > bb3:
> > > > > >   unreachable
> > > > > > ```
> > > > > > 
> > > > > > My point being, we can't realistically say that we will/won't succeed in sinking stuff.
> > > > > there's no correctness issue with saying that we modified IR if we didn't actually, it'll just invalidate analyses, causing more work when they are recomputed in later passes
> > > > > 
> > > > > might be worth putting this through http://llvm-compile-time-tracker.com/
> > > > Since you asked, sure: https://llvm-compile-time-tracker.com/compare.php?from=1f169a774cb865659cefe085e70a56a884e3711e&to=fc54bb9a8ef85bd76dd9e934b2546f4beadc5b5e&stat=instructions
> > > > I'm not sure what this tells us here.
> > > > Since the instruction stat correlates with the size changes, i guess we could say that it lead to more inlining,
> > > > and more IR to chew through. Which is pretty much the expected outcome.
> > > This shows a 10% increase in code size on mafft with LTO and a few others also increase by multiple percent points. Did you rerun @rnk's test on clang Release+Assert code size with this patch?
> > > 
> > > It looks like large code size increases are still the blocker for this patch, as they were back then.
> > > This shows a 10% increase in code size on mafft with LTO and a few others also increase by multiple percent points.
> > 
> > Yep.
> > 
> > > Did you rerun @rnk's test on clang Release+Assert code size with this patch?
> > 
> > I have not because there is no reason to expect that the outcome is different.
> > (It will be somewhat different, because the approach is somewhat different)
> > 
> > > It looks like large code size increases are still the blocker for this patch, as they were back then.
> > 
> > I'm not quite sure how we arrive at this conclusion.
> > 
> > Let me make a comparison: when one tries to paint something,
> > it is expected that not only said something will get colored,
> > but the paint amount will use up.
> > 
> > What i'm saying is that the effect this has is not unexpected, on the contrary, it is expected.
> > We successfully decrease the amount of IR bloat by assertion blocks,
> > decreasing the size of the functions they are in,
> > and naturally that makes some of them more eligible for inlining.
> > which happens, and increases codesize.
> > 
> > I would like to also call-back to the disscussion in D101468,
> > where we had very much the same disscussion, and actually i argued that it was bad,
> > but @nikic argued that said change is good since we no longer overestimate the inlining cost,
> > and i if that leads to an overestimation, then the problem is in inliner.
> > I'm not sure how in this patch the views changed to diametrically opposite ones :)
> Forgot to mention: `"There are three kinds of lies: lies, damned lies, and statistics."`
> The n% increase in code size is a pretty meaningless number,
> and i'm sad to see it being used in such a harsh blocking manner.
> What we should at least do, is look at how it compares with assert-less code.
> 
> I don't yet have clang numbers, but here's some for RawSpeed:
> ```
> $ stat --printf="%s %n\n"  build-release-*/src/utilities/rsbench/rsbench | sort
> 17188264 build-release-new/src/utilities/rsbench/rsbench
> 17234640 build-release-old/src/utilities/rsbench/rsbench
> 17464336 build-release-with-asserts-new/src/utilities/rsbench/rsbench
> 17508840 build-release-with-asserts-old/src/utilities/rsbench/rsbench
> ```
> I.e. `-DNDEBUG`->`-UNDEBUG` is +1.6% increase,
> while `old`->`new` (i.e. this patch) causes -0.25% decrease.
> 
> Let me get these numbers for clang...
And here's clang numbers.
```
$ stat --printf="%s %n\n" build-release-*/bin/clang-13 | sort
103032240 build-release-old/bin/clang-13
103046136 build-release-new/bin/clang-13
123732704 build-release-with-asserts-old/bin/clang-13
123882984 build-release-with-asserts-new/bin/clang-13
```
I.e. `-DNDEBUG`->`-UNDEBUG` is +20% increase,
while old->new (i.e. this patch) causes +0.1% increase for assert-ful build, and +0.01 for assert-less one.

But what this really tells us is that the numbers will vary depending on the underlying libc implementation.
The thing is, glibc's `__assert_fail()` has 4 arguments (the stringified assertion, filename, line, function),
and in worst-case scenario we'll need a PHI for each one of them, yet currently the profitability check
only allows a single PHI.

So to reproduce @rnk's numbers, he'd have to redo the test on whatever platform used originally.

Another way to spell this, the regression will appear later when profitability check is tuned :)

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D104870/new/

https://reviews.llvm.org/D104870