[PATCH] D135551: [clang] replace `assert(0)` with `llvm_unreachable` NFC

Wed Oct 12 05:19:31 PDT 2022

aaron.ballman added a comment.

In D135551#3850511 <https://reviews.llvm.org/D135551#3850511>, @dblaikie wrote:

> In D135551#3850391 <https://reviews.llvm.org/D135551#3850391>, @aaron.ballman wrote:
>
>> In D135551#3850308 <https://reviews.llvm.org/D135551#3850308>, @dblaikie wrote:
>>
>>> In D135551#3850266 <https://reviews.llvm.org/D135551#3850266>, @inclyc wrote:
>>>
>>>> This makes sense! However I think `assert(0)` should not be used in this case, we could expose another `llvm_unreachable`-like api and probably `llvm_report_error` shall be fine. Are there some changed assertions actually "Aspirationally unreachable" in this patch?
>>>
>>> No, I really don't think  we should go down that path.
>>>
>>> I believe these are not actually distinct cases - in either case, the program has UB if they violated the invariants/preconditions - whether or not they called through the C API.
>>
>> The C Index test cases I commented on earlier in the review are a good example of when there's no UB but we still want to alert people to the problem of code they should not be reaching. The assumption that "reached here unexpectedly" == "UB" is invalid. Some things are logic bugs that exhibit no UB.
>>
>>> unreachable is no more a guarantee/proven thing than an assertion - both are written by humans and a claim "if this is reached-or-false, there is a bug in some code, somewhere". The statement is not stronger in the unreachable case and the style guide supports that perspective and the way we triage/treat bugs is pretty consistent with that - we get bugs all the time when an unreachable is reached and that doesn't seem to surprise most/anyone - we treat it the same as a bug when an assertion fires.
>>>
>>> The discourse discussion, I think, supports this ^ perspective.
>>>
>>> As there's still disagreement, should this escalate to the RFC process to change the style guide, Aaron?
>>
>> Yes, I would appreciate that. I don't think we're interpreting our policy the same way. Specifically "Use llvm_unreachable to mark a specific point in code that should never be reached."
>
> In the same way that an assert says "This condition should never be false" - I use "should" in the same sense in both unreachable and assert, and I believe that's the prevailing opinion of LLVM developers/the LLVM style guide.

I believe our code base says something different than our "prevailing opinion" and we should not be discounting the reality of how our code is written today.

> Perhaps we are also at a deadlock as to who should write the proposal... :/

Agreed, you and I are probably both too close to the issue to write the proposal right now. If nobody else does it first, maybe the two of us could circle back in a few months after we've had time to research and think more deeply and we could co-write something (even if it is a multiple choice RFC between conflicting directions).

>> - "should" is turning out to be interpreted in two ways:
>>
>> - "used to indicate obligation, duty, or correctness, typically when criticizing someone's actions. e.g., he should have been careful": I am asserting it is impossible to reach this.
>
> This is the "should" ^ I mean, and what every assert should mean too. This code assumes this property to be true - this is a precondition of the code.
>
> We should not be using asserts where we don't mean this. I'm OK assuming every assert does mean "this is a precondition" and treating them that way in terms of transforming them to unreachable or anything else we might do with them - and if some of them don't mean it, then they're buggy and we can fix them, but assert->unreachable doesn't make the situation any worse.
>
> Any code behind/after an assert is untested and unvalidated - we can't say "if you violate this constraint it'll actually be OK" because we've never tested that/don't know that.

Just to double-check... are you opposed to the idea of differentiating between ways of saying "the reachability of this code is in question" or are you opposed to use of `assert` (or something that smells too similar) specifically? Because I don't care about *how* we spell the differentiation, just that there's not a policy limiting my ability to express my intent in code. I'd be perfectly fine with `#define some_name_we_agree_upon(msg) llvm_unreachable(msg)` being the facility we use instead of `assert(0);`.

>> - "used to indicate what is probable. e.g., $348 million should be enough to buy him out": I am asserting you probably won't get here, but you won't be happy if you do.

I'm arguing that we have a not-insignificant amount of uses that have this^ interpretation and I want a way to express that distinction in code. I have an allergic reaction to using `llvm_unreachable` to express known-to-be-potentially reachable code because my complaint is that the annotation causes the code to be *less readable* as a result. I have to know about our community's novel definition of what "unreachable" means in order to read that code correctly. I think we'd be better served to leave "unreachable" annotations for places that we know we can't reach and use literally any other named API to express situations we know can potentially be reached (the assumption that we're going to have test coverage for all of these situations is a non-starter to me; until the community starts showing a stronger interest in tracking test coverage metrics and holding ourselves accountable for that coverage, it's not a compelling argument that "tests should catch this" because we know they won't).

> @rnk's comment here ( https://discourse.llvm.org/t/llvm-unreachable-is-widely-misused/60587/3 ) pretty well sums up my understanding/values here and it looks like on that thread, mostly the long term LLVM developers agree with this perspective and are trying to explain it to the (so far as I can tell) relative new/outside developer.

I don't want to belabor this topic any longer as I think we've both said our pieces enough by now. I don't think we should make any code changes at this point unless there's agreement that the code is actually improved by the change. I think we can identify some of those noncontroversial cases in this review so that we get some benefit from all this effort. But I think we should leave anything that's contentious alone for the time being.

================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/MemRegion.h:601
     assert(isa<ObjCMethodDecl>(FD));
-    assert(false && "Getting the type of ObjCMethod is not supported yet");
+    llvm_unreachable("Getting the type of ObjCMethod is not supported yet");

----------------
dblaikie wrote:
> aaron.ballman wrote:
> > This looks like it possibly should have been an error reported to the user? If not, this is a wonderful example of what I mean by aspirationally unreachable code. We aspire to not getting here -- but we can get here just the same and there is not UB as a result (we return a valid object, UB might happen elsewhere based on invalid assumptions of what is returned).
> "we" can get here in what sense? Is there source code that can be passed to the static analyzer that can reach this? Then it shouldn't be an assert, it should be exercised and tested - even if that test says "hey, this doesn't do anything good yet". I'd assume, whether I see the assert or the unreachable that it isn't reachable from clang on the command line - and that any code that makes it reachable/calls it under this condition is incorrect (for now, until support is added) and would be considered a bug to be fixed. Whether it's the assert or the unreachable, I consider it to be the same statement of intent.
> "we" can get here in what sense? Is there source code that can be passed to the static analyzer that can reach this? 

You are asking the question I claim the original source already answered and the new source fails to answer. Use of `assert(false)` like that tells me as a code reviewer "this code could potentially be reached and if it does, that's a problem" and so I know to go look for those cases to make sure we handle them properly, or if I've found a bug after the code was released I then have a better idea of what possible problems exist. Asserting "this is unreachable" tells me "the developer already did that audit and knows this is unreachable" and sends me down other, less productive paths before I finally go "oh, that annotation was lying to me, this isn't actually unreachable."

> Then it shouldn't be an assert, it should be exercised and tested - even if that test says "hey, this doesn't do anything good yet". 

I like that idea. In reality, we're nowhere near having test coverage like that (at least in Clang).

================
Comment at: clang/lib/AST/ExprConstant.cpp:7561-7564
     if (Source == E) {
-      assert(0 && "OpaqueValueExpr recursively refers to itself");
+      llvm_unreachable("OpaqueValueExpr recursively refers to itself");
       return Error(E);
     }
----------------
dblaikie wrote:
> aaron.ballman wrote:
> > Probably should be `assert(Source != E && "OpaqueValueExpr recursively refers to itself");`
> Yep, though, again, I think it's a valid transformation even if it doesn't go as far as it could.
One thing that's worth noting -- the proposed code and my "probably" observation from above have the result of making the code *less* robust because they remove the perfectly reasonable recovery mechanism of saying there's an error. So they both go from "may give odd results but is unlikely to cause a vulnerability" to "vulnerability more possible" by losing that property.

================
Comment at: clang/lib/AST/Interp/ByteCodeExprGen.cpp:136-137

   default:
-    assert(false && "Cast not implemented");
+    llvm_unreachable("Cast not implemented");
   }
----------------
dblaikie wrote:
> aaron.ballman wrote:
> > I think this can just be removed, right? There's another unreachable for falling out of the `switch` that seems to be covering the same situation.
> Presumably the switch isn't fully covered (otherwise we'd get a `-Wcovered-switch-default` warning).
> 
> Might be that the unreachable after the switch can be removed in this case.
Fair -- so long as we only end up with one unreachable, I think that's an improvement.

================
Comment at: clang/lib/AST/Interp/ByteCodeExprGen.cpp:598
       } else {
-        assert(false && "Unhandled type in array initializer initlist");
+        llvm_unreachable("Unhandled type in array initializer initlist");
       }
----------------
dblaikie wrote:
> aaron.ballman wrote:
> > The rest of the ones here are somewhat interesting in that the interpreter is an experiment under active development and is known to be incomplete. In all of these cases, I think the switch to unreachable is flat-out wrong -- these asserts serve explicitly to find unimplemented cases when we hit them.
> & I don't see why unreachable is any different a statement than assert(false) in these cases... - it's the same statement of intent. "if this is reached you've found a bug" (in this case, a missing feature)
> 
> But I'd be sort of OK changing all these to report_fatal_error. But, again, I think the assert(false) -> unreachable is a valid transformation and doesn't make anything worse than it already is, but improves the code by being more consistent and removing this confusion that there might be something different about assert(false) when, I believe, there isn't.
> & I don't see why unreachable is any different a statement than assert(false) in these cases... - it's the same statement of intent. "if this is reached you've found a bug" (in this case, a missing feature)

You are asserting it's the same statement of intent and I keep pointing out that people use the different constructs in practice because they're different statements of intent. I don't know how to resolve this difference of opinion, but I can say as someone doing code review in this area recently that your interpretation is wrong according to what we were after with this code.

I'd be fine changing it to `report_fatal_error` instead of `assert(false)`; I'd be strongly opposed to switching to `llvm_unreachable`.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D135551/new/

https://reviews.llvm.org/D135551