[cfe-dev] Reaching the end of a value-returning function in C++

Thu Oct 25 16:20:13 PDT 2012

On Oct 25, 2012, at 4:12 PM, John McCall <rjmccall at apple.com> wrote:

> On Oct 25, 2012, at 3:31 PM, Chandler Carruth wrote:
>> On Thu, Oct 25, 2012 at 3:02 PM, John McCall <rjmccall at apple.com> wrote:
>>> 
>>> Is it a real optimization in practice, though?  This situation can't be formed
>>> or exposed by optimization;  you actually have to have a single function
>>> with a reachable implicit return site.  Where is this supposed real code
>>> that does this intentionally but cannot use noreturn attributes and
>>> unreachable markers?
>> 
>> Oh, the code *can* use unreachable markers, but it's not clear that we
>> should just give up on all of these optimization opportunities if the
>> user fails to use them.
>> 
>> Think about everywhere in LLVM's codebase that has a "covered" switch
>> on an enum. Technically there is a branch that falls off the end
>> (unless the enum has a power of two number of elements), and without
>> the llvm_unreachable we add everywhere, the optimizer preserves this.
> 
> This is a fair point;  switch is a widespread pattern that does provoke this
> accidentally.
> 
>> We could require users to add such annotations, but "making debugging
>> crashes of optimized binaries easier" seems a weak argument. LLVM is
>> replete with optimizations which will make code with undefined
>> behavior crash in new and surprising ways. Avoiding this one is a drop
>> in the ocean.
> 
> The problem is that 'unreachable' is a much stronger form of undefined
> behavior than pretty much anything else in the system.  We do not generally
> optimize sites committing undefined behavior by treating them as unreachable;
> we used to do this much more aggressively, and it was absolutely lethal,
> because the only way to debug such code is to carefully read the assembly,
> discover that large chunks of your function have disappeared, recognize
> the transformation, and try to find out why it happened.
> 
> In other words, it had to debugged by compiler writers.  Ahem.
> 
>>>> To answer why we need the semantic unreachable to get these
>>>> optimization opportunities: in a word, inlining. When inlining
>>>> collapses the CFG of a function, having the unreachable hint can be
>>>> essential to selecting the proper representation.
>>> 
>>> Can you expand on this?  How does inlining create this opportunity?
>> 
>> Using the example above of a switch over an enum, let's imagine that
>> after inlining the optimizer proves that the high bit is set in the
>> input, and the highest enumerator is that value: the top bit, and all
>> zeros. Now, if we have the unreachable, we can prove that there is a
>> single path through the CFG. If we don't, we have to assume that the
>> value might be *larger* than the largest enumerator.
> 
> Okay, so that example is totally fanciful and you should feel bad. :)
> But there's a somewhat imaginable situation where our switching function
> is called, and the call is in the shadow of a branch on a comparison
> against one of the enumerators, and we could greatly simplify the switch
> (possibly to nothing) if we can prove that the default case is unreachable.
> 
> So, basically, I don't like the idea of using 'unreachable' in arbitrary
> positions here, but if it's practical to do it only after a covered switch, I
> could get behind that.

FWIW, I also think this is a good use case to put 'unreachable' (it matches the frontend warning semantics, the frontend already assumes that what follows the switch is unreachable for warning purposes).
I still don't think it is enough of a reason to put 'unreachable' everywhere though.

Will you find it satisfactory if the covered switch case is handled ?

> 
> John.