[PATCH] D135551: [clang] replace `assert(0)` with `llvm_unreachable` NFC

Mon Oct 10 12:15:44 PDT 2022

aaron.ballman added a comment.

In D135551#3847444 <https://reviews.llvm.org/D135551#3847444>, @dblaikie wrote:

> I thought this was settled quite a while ago and enshrined in the style guide: https://llvm.org/docs/CodingStandards.html#assert-liberally
>
> `assert(0)` should not be used if something is reachable. We shouldn't have a "this violates an invariant, but if you don't have asserts enabled you do get some maybe-acceptable behavior".
>
> I feel fairly strongly that any cases of "reachable asserts" should be changed to valid error handling or `llvm_report_error` and remaining `assert(0)` should be transformed to `llvm_unreachable`. (also, ideally, don't use branch-to-`unreachable` where possible, instead assert the condition - in cases where the `if` has side effects, sometimes that's the nicest way to write it, but might be clearer, if more verbose to use a local variable for the condition, then assert that the variable is true (and have the requisite "variable might be unused" cast))

I would be okay with that, but that's not what this patch was doing -- it was changing `assert(0)` into an `llvm_unreachable` more mechanically, and I don't think that's a valid transformation. The key, to me, is not losing the distinction between "reaching here is a programming mistake that you'd make during active development" vs "we never expect to reach this patch and want to optimize accordingly." `__builtin_unreachable` changes the debugging landscape far too much for me to want to see folks using it for "reaching here is a programming mistake" situations, *especially* in RelWithDebInfo builds where optimizations are enabled and may result in surprising call stacks and time travel debugging.

>> Historically, we've usually used llvm_unreachable for situations where we're saying "this code cannot be reached; if it can, something else has gone seriously wrong." For example, in code like: int foo(SomeEnum E) { switch (E) { case One: return 1; default: return 2; } llvm_unreachable("getting here would be mysterious"); } and we've used assert(0) for situations where we're saying "this code is possible to reach only when there were mistakes elsewhere which broke invariants we were relying on."
>
> I don't think those are different things though - violating invariants is ~= something going seriously wrong.
>
> (sorry, I guess I should debate this on the thread instead of here - but I think most of the folks on that thread did agree with the LLVM style guide/the direction here)
>
> This, I think was also discussed about a decade ago in the LLVM community and resulted in r166821/2962d9599e463265edae599285bbc6351f1cc0ef which specifically "Suggests llvm_unreachable over assert(0)" and is the policy of LLVM - this change is consistent with that policy.

I don't have the context for those changes in my email either, but regardless of what we thought ten years ago, we have code in the code base today that assumes a difference in severity between kinds of unreachable statements so we need to be careful when correcting mistakes. I think we're still in agreement that `llvm_unreachable` should be preferred over `assert(0)` in situations where the code is expected to be impossible to reach. I think we're also still in agreement that "correct error reporting" is preferred over `assert(0)`. Where we might still have daylight are the occasional situations where `assert(0)` gives a better experience -- when the code is possible to reach but reaching it signifies a developer (not user) mistake that is plausible to make when doing new development, such as when misusing the C interface (where there isn't always an error that should be reported via the API). I think these uses should generally be rare, but I don't think it's a "never do this" kind of situation.

`llvm_unreachable` asserts in debug mode, so it has the nice property we need when doing new development. But it doesn't convey the distinction in semantics. Maybe another approach is: `#define developer_bugcheck(msg) llvm_unreachable(msg)` (or something along those lines). We still get the assertion in debug mode, we then get the better optimization in release mode, but we don't lose the semantic information in the code base. It doesn't really help the RelWithDebInfo case though (where call stacks may be unrecognizable as a result of time travel optimizations) but maybe that's a tradeoff worth making?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135551/new/

https://reviews.llvm.org/D135551