[PATCH] D86559: [Sema, CodeGen] Allow [[likely]] and [[unlikely]] on labels

Thu Aug 27 06:16:41 PDT 2020

aaron.ballman added a comment.

In D86559#2239859 <https://reviews.llvm.org/D86559#2239859>, @Mordante wrote:

> In D86559#2239013 <https://reviews.llvm.org/D86559#2239013>, @aaron.ballman wrote:
>
>> I'd like to understand your reasoning for ignore + diagnose as opposed to count attrs (+ optionally diagnose) or other strategies. I can see pros and cons to basically anything we do here.
>>
>> If we decide to ignore conflicting likelihoods, then I agree we should issue a diagnostic. However, I suspect that's going to come up frequently in practice because people are going to write macros that include these attributes. For instance, it's very sensible for a developer to put [[unlikely]] in front of an `assert` under the assumption this is telling the compiler "this assertion isn't likely to happen" when in fact it's saying "this branch containing the assertion is unlikely to be taken".
>
> This macro is example why I dislike the counting. If `MyAssert` has an `[[unlikely]]` attribute, then changing the number of `MyAssert` statements will influence the likelihood of the branches taken.

<nod>, but this is an example of why I don't like the behavior of applying this to arbitrary statements; this is spooky behavior that would be very hard to spot in code reviews unless you're an expert on C++.

>> This is one of the reasons why the GCC behavior of allowing these attributes on arbitrary statements feels like an awful trap for users to get into. If the attribute only applied to compound statements following a flow control construct, I think the average programmer will have a far better chance of not using this wrong. (I know that I said in the other thread we should match the GCC behavior, but I'm still uncomfortable with it because that behavior seems to be fraught with dangers for users, and this patch introduces new sharp edges, as @Quuxplusone pointed out.)
>
> I'm also not fond of GCC's implementation, but I think their implementation conforms with the standard.

Conforming to the standard is not hard in this case. We can parse the attributes and ignore their semantics entirely (but not the constraints) and that also conforms to the standard. Our job is to figure out what the best behavior is for our implementation (which may or may not mean following the GCC behavior).

> In hindsight I think it indeed makes more sense to only allow it on compound statements. That will require thinking about how to mark multiple cases as `[[likely]]` when falling through:
>
>   switch(a) {
>     case '0':
>     case '1':
>     case '2': [[likely]] { return 'a' - '0'; }      // All 3 cases likely?
>   
>     case '3':                                                      // neutral?
>     case '4': [[likely]]()                                    // likely?
>     case '5': [[unlikely]] {return 'a' - '0'; }  // unlikely?
>   }
>
> Of course this can be solved by only allowing it on the case labels:
>
>   switch(a) {
>    [[likely]] case '0':
>    [[likely]] case '1':
>    [[likely]] case '2':  { return 'a' - '0'; }    // All 3 cases likely
>   
>     case '3':                                                     // neutral
>     [[likely]] case '4':                                    // likely
>     [[unlikely case '5':  {return 'a' - '0'; }  // unlikely
>   }

That is exactly the behavior I am coming to believe we should follow. You can either write it on a compound statement that is guarded by a flow control decision (`if`/`else`/`while`) or you can write it on a label, otherwise the attribute is parsed and ignored (with a diagnostic). This feels like the right mixture of useful with understandable semantics so far, but perhaps we'll find other examples that change our minds.

The fallthrough behavior question has one more case we may want to think about:

  switch (x) {
  case 0:
  case 1:
  [[likely]] case 2: break;
  [[unlikely]] default:
  }

Does this mark cases `0` and `1` as being likely or not? I could see users wanting to use shorthand rather than repeat themselves on all the cases. However, I'm also not certain whether there would be any performance impact if we marked only `case 2` as likely and left cases `0` and `1` with default likelihood. My gut feeling is that this should only mark `case 2`, but others may have different views.

> I fully agree the behaviour mandated by the standard is way too complex and user unfriendly. It would have been nice if there were simpler rules, making it easier to use and to teach. Still I think it would be best to use the complex approach now, since that's what the standard specifies. During that process we can see whether there are more pitfalls. Then we can discuss it with other vendors and see whether we can change the wording of the standard. Do you agree?

The only requirement from the standard is that we parse `[[likely]]` or `[[unlikely]]` on a statement or label, that we don't allow conflicting attributes in the same attribute sequence, and that the attribute has no arguments to it. All the rest is recommended practice that's left up to the implementation as a matter of QoI. So we conform to the standard by following the approach on compound statements and labels + the constraint checking. We may not be following the recommended practice precisely, but we may not *want* to follow the recommended practice because it's bad for usability. So I'd like us to design the feature that we think gives the most value and is the easiest to use that matches the recommended practice as best we can, then start talking to other vendors. If other vendors don't want to follow our design, that's totally fine, but we should ensure our design *allows* users to write code that will behave similarly for other implementations without diagnostics. e.g., we don't want to design something where the user has to use macros to avoid diagnostics in cross-compiler code. After that, we may want to propose some changes to the recommended practice to WG21.

>From my current thinking, it seems that there may be agreement that allowing these on arbitrary statements may be really difficult for users to understand the behavior of and that compound statements and labels are what we think is an understandable and useful feature and is also a strict subset of what we COULD support. By that, I think we should limit the feature to just compound statements and labels; this leaves the door open to allow the attributes on arbitrary statements in the future without breaking code. If we allow on arbitrary statements from the start, we can't later restrict the feature because we'd break code. This lets us start getting field experience with that behavior to see how we like it, which may also help when talking to other vendors because we'll have something concrete to talk about (hopefully). WDYT?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86559/new/

https://reviews.llvm.org/D86559