[PATCH] D135551: [clang] replace `assert(0)` with `llvm_unreachable` NFC

Mon Oct 10 14:46:04 PDT 2022

dblaikie added a comment.

In D135551#3847607 <https://reviews.llvm.org/D135551#3847607>, @aaron.ballman wrote:

> In D135551#3847444 <https://reviews.llvm.org/D135551#3847444>, @dblaikie wrote:
>
>> I thought this was settled quite a while ago and enshrined in the style guide: https://llvm.org/docs/CodingStandards.html#assert-liberally
>>
>> `assert(0)` should not be used if something is reachable. We shouldn't have a "this violates an invariant, but if you don't have asserts enabled you do get some maybe-acceptable behavior".
>>
>> I feel fairly strongly that any cases of "reachable asserts" should be changed to valid error handling or `llvm_report_error` and remaining `assert(0)` should be transformed to `llvm_unreachable`. (also, ideally, don't use branch-to-`unreachable` where possible, instead assert the condition - in cases where the `if` has side effects, sometimes that's the nicest way to write it, but might be clearer, if more verbose to use a local variable for the condition, then assert that the variable is true (and have the requisite "variable might be unused" cast))
>
> I would be okay with that, but that's not what this patch was doing -- it was changing `assert(0)` into an `llvm_unreachable` more mechanically, and I don't think that's a valid transformation. The key, to me, is not losing the distinction between "reaching here is a programming mistake that you'd make during active development" vs "we never expect to reach this patch and want to optimize accordingly."

I don't really think those are different things, though. Violating invariants is UB and there's no discussion to be had about how the program (in a non-asserts build) behaves when those invariants are violated - all bets are off, whether it's assert or unreachable.

> `__builtin_unreachable` changes the debugging landscape far too much for me to want to see folks using it for "reaching here is a programming mistake" situations, *especially* in RelWithDebInfo builds where optimizations are enabled and may result in surprising call stacks and time travel debugging.

Generally LLVM's pretty hard to fathom in a non-asserts build anyway, right? (that's the first thing any of us do is reproduce with an assertions build that may fail miles away from where a crash occurred because an invariant was violated much earlier) - that `cast` won't crash/will continue on happily in a non-asserts build seems like a much larger hole to debuggability of a non-asserts build than any unreachable?

>>> Historically, we've usually used llvm_unreachable for situations where we're saying "this code cannot be reached; if it can, something else has gone seriously wrong." For example, in code like: int foo(SomeEnum E) { switch (E) { case One: return 1; default: return 2; } llvm_unreachable("getting here would be mysterious"); } and we've used assert(0) for situations where we're saying "this code is possible to reach only when there were mistakes elsewhere which broke invariants we were relying on."
>>
>> I don't think those are different things though - violating invariants is ~= something going seriously wrong.
>>
>> (sorry, I guess I should debate this on the thread instead of here - but I think most of the folks on that thread did agree with the LLVM style guide/the direction here)
>>
>> This, I think was also discussed about a decade ago in the LLVM community and resulted in r166821/2962d9599e463265edae599285bbc6351f1cc0ef which specifically "Suggests llvm_unreachable over assert(0)" and is the policy of LLVM - this change is consistent with that policy.
>
> I don't have the context for those changes in my email either, but regardless of what we thought ten years ago, we have code in the code base today that assumes a difference in severity between kinds of unreachable statements so we need to be careful when correcting mistakes. I think we're still in agreement that `llvm_unreachable` should be preferred over `assert(0)` in situations where the code is expected to be impossible to reach. I think we're also still in agreement that "correct error reporting" is preferred over `assert(0)`. Where we might still have daylight are the occasional situations where `assert(0)` gives a better experience -- when the code is possible to reach but reaching it signifies a developer (not user) mistake that is plausible to make when doing new development, such as when misusing the C interface (where there isn't always an error that should be reported via the API). I think these uses should generally be rare, but I don't think it's a "never do this" kind of situation.

That still seems like something that should be caught by an asserts build and is UB otherwise. If you're developing against LLVM's APIs in a non-assertions build, there's a lot of other invariants that won't be checked (`cast` being a pretty core example) and will make debugging really difficult.

> `llvm_unreachable` asserts in debug mode, so it has the nice property we need when doing new development. But it doesn't convey the distinction in semantics. Maybe another approach is: `#define developer_bugcheck(msg) llvm_unreachable(msg)` (or something along those lines). We still get the assertion in debug mode, we then get the better optimization in release mode, but we don't lose the semantic information in the code base. It doesn't really help the RelWithDebInfo case though (where call stacks may be unrecognizable as a result of time travel optimizations) but maybe that's a tradeoff worth making?

I don't really see the distinction, though - it's a violated invariant, which is a developer bug, whether it's from an assert on unreachable. They're both expressing invariants - not of different strength. "Valid code should never reach this branch" and "valid code should never produce a false result here".

I'd really like to avoid what I see as cleanup - replacing one construct (assert(false)) with another (llvm_unreachable) of identical (to me) contractual strength - being slowed because of this distinction. Replacing reachable-unreachables, or reachable-false-assertions with llvm_report_error, to me, is orthogonal to this cleanup.

(& for the C API - I think assertions/unreachable are the right thing, not llvm_report_error - I'm sure there's many more ways to violate the C API contract that would result in inexplicable/confusing behavior and that will only ever be protected by assertions, than we might cover by a few llvm_report_errors on the interface & that should be the direction we encourage people to go down - develop with assertions enabled)

Basically llvm_report_error ends up/should be a "we couldn't find a better way to do error handling here, but it is really a reachable error we have to do something with" - each one is a bug in the library-ness of LLVM. (or it's used up in a tool implementation (above the library layer) and then it's fine/just a convenient way to error/exit, though probably still isn't so good for diagnostic quality from such tools). If it's not reachable by a correct usage (be it from a command line tool or API) then it should be an assertion, not an llvm_report_error.

But I know this issue comes up from time to time and I've yet to figure out a way to come to shared understandings on it :/ so I don't expect my rather narrow and firm definition to necessarily carry the day on this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135551/new/

https://reviews.llvm.org/D135551