[cfe-commits] r140232 - /cfe/trunk/lib/Sema/SemaExpr.cpp

Wed Sep 21 17:07:07 PDT 2011

On Wed, Sep 21, 2011 at 4:13 PM, Chandler Carruth <chandlerc at google.com>wrote:

> On Wed, Sep 21, 2011 at 3:49 PM, David Blaikie <dblaikie at gmail.com> wrote:
>
>>
>>
>> On Wed, Sep 21, 2011 at 3:36 PM, Chandler Carruth <chandlerc at google.com>wrote:
>>
>>> On Wed, Sep 21, 2011 at 3:25 PM, David Blaikie <dblaikie at gmail.com>wrote:
>>>
>>>> One other point on this - the issues you've raised already exist in both
>>>> assert and llvm_unreachable code, making this change (assert(0) ->
>>>> llvm_unreachable) somewhat orthogonal to them.
>>>>
>>>
>>> Not at all. llvm_unreachable has a tremendously different impact in
>>> optimized builds than assert(0) does. The latter has zero impact, the
>>> control flow remains as expected.
>>>
>>> But llvm_unreachable tells the optimizer that this code path *cannot be
>>> taken*. That in turn constrains the value of the input. That in turn causes
>>> other optimizations to fire, and so on. This can cause essentially unbounded
>>> undefined behavior given an unexpected input, where as the assert will
>>> "merely" fall through.
>>>
>>> I've specifically seen cases in the disassembler and assembler code where
>>> truly unexpected cases are assert(0)'ed, but then handled benignly.
>>>
>>
>> That's pretty much pure luck & potentially just hides the bug for longer
>> in retail builds that might otherwise fail more catastrophically (or less,
>> certainly - but I don't think it's strictly better to have release builds be
>> agnostic to these paths)
>>
>
> I really think we're talking past each other.
>

Probably, somewhat. I'm not sure I'm able to accurately express my position
here, but this discussion/disagreement is one I've certainly encountered
before, so perhaps it's more me than you.

Imagine:
>
> switch (input) {
>   default: assert(0 && "Never handled input sequence!!");
>   case FOO_InvalidA:
>   case FOO_InvalidB:
>   case FOO_InvalidC:
>     DiagnoseInvalidInput(input);
>     break;
>   case FOO_...:
>     ...
>   case FOO_...:
> }
>
> Here, we have logic to handle invalid inputs, but we try to cover all of
> the invalid squences which we expect, and use an assert to catch them in
> debug builds. In opt builds, clearly we want to just gracefully fall back to
> the invalid input handling, regardless.
>

I'm not trying to be obtuse, but that isn't entirely clear to me that in
such cases we want to gracefully fall back - in clang code itself I've seen
cases much like that (well, not quite, there's very few places, if any,
where we have diagnose invalid input but only specify some of the possible
invalid inputs) where a set of values are handled but the default contains
an assert. This assert is generally a "if we have a FOO_Crazy here, I don't
know what's going on" - eg: an invalid argument, or the like.

One example of this I came across is, for example, in
tools/libclang/CIndexUSRs.cpp, line 320-ish.
USRGenerator::VisitObjCContainerDecl - it switches over the Kind, but
expects only to find a subset of Kinds because it has a particular subclass
of Decl. It asserts (really should be llvm_unreachable) on any other kind
because it would be totally bogus.

> This doesn't hide any bugs, and it doesn't cause any problems.
>

What I said about hidden bugs is that these code paths - not only the ones
from the assert onwards, but the ones prior to the assert that trigger it,
are untested. We haven't tested any code path that leads to this function
being passed FOO_Crazy - so if we reach that state it's not safe to say we
can just carry on as though it's just FOO_InvalidN.

While, yes, a developer could make a clear decision about this "Here's
something I think might be true, and for my own sanity I'd like to know
about it if it's not true, but if it isn't true I might be able to handle
it" I think that's a really questionable thing to try to enshrine in code.
This goes for all asserts (hence my comment about the idea of llvm being
able to use conditional assertions to further optimize code).

I realize this isn't what everyone means when they say assert, evidently, &
I'm not doing a terribly good describing why I think that point of view is
incorrect/unhelpful.

> Fundamentally, I think you should let the authors of the code who actually
> know what cases they are and aren't intending to handle decide on the best
> construct here. My primary objection is to what seems to be a blind
> application of a heuristic... Maybe I'm misunderstanding what you're
> proposing.
>

I think what I'm proposing is fairly clear - you can see the patch I sent to
the list & it's really as simple as it sounds "all assert(false) should be
llvm_unreachable".

> I don't think we want to tell the optimizer that this *cannot* happen, but
>>> rather catch when it does happen in our test suite. The assert seems well
>>> suited to that.
>>>
>>
>> Is this true even in debug builds? My understanding was that
>> llvm_unreachable was much like an assert in debug builds - otherwise the use
>> of a message is rather useless since the compiler might not even have such
>> code anymore.
>>
>
> No, in debug builds it aborts with a message. Only in opt builds does it
> actually change the optimizer's CFG.
>
>
>> If it is true, should we consider changing our llvm_unreachable usage to
>> assert(0s? That's what I was getting at above. We're already treating the
>> same case in two different ways (some times with assert, sometimes with
>> llvm_unreachable) & it'd be nice to figure out the right way.
>>
>
> I'm trying to explain that there are *different* cases here, and that
> assert and llvm_unreachable are different because of that. These are two
> tools which, while having similarities, are intended to solve two distinct
> problems.
>

Right & I don't think the "assert but expect to behave in any particular way
in release builds" is a useful construct. Talking about the behavior of the
code in situations where developer assumptions are false seems like a very
shaky foundation.

> [on that note about optimizations - it'd be nice if our asserts could be
>> used to implement the same kind of cascading optimizations that
>> llvm_unreachable provides, wouldn't it?]
>>
>
> I don't actually like this. I think it will make bugs found in the wild
> harder to diagnose and debug, for limited gains...
>

It'd be interesting to know what the gains would be, I imagine. But there's
no conditional equivalent of llvm_unreachable - so the nearest thing would
be to just write conditionals?

assert(x) => if (x) { llvm_unreachable() }

But at the very least if you believe there are two cases (assert(0) and
llvm_unreachable are both valid constructs in different situations) I expect
the same is true of conditional assertions. There must be some cases (I'd
say the majority of cases) where certain conditional assertions should have
unreachable-like (compiler hinting) semantics.

But this is going a long way off, sorry to randomize things & I don't mean
to take your/the lists time. I can simply focus on less contentious issues
in the code base. If you'd like, perhaps discussing it on IRC would be more
constructive.

- David
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-commits/attachments/20110921/edde6798/attachment.html>