[PATCH] D126960: [clang][sema] Unary not boolean to int conversion

Richard Smith - zygoloid via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Wed Jun 15 13:25:14 PDT 2022


rsmith added a comment.

Thank you for the patch! This is certainly an improvement but I think there are still some cases where we compute the wrong range for `~` with this patch applied.



================
Comment at: clang/lib/Sema/SemaChecking.cpp:12328
       return IntRange::forValueOfType(C, GetExprType(E));
 
+    case UO_Not:
----------------
aaron.ballman wrote:
> erichkeane wrote:
> > Richard mentions UO_PreInc, UO_PreDec, UO_Minus, and UO_Not.  What about the other 3?
> `UO_Plus` also.
Taking those in turn:
- `UO_PreInc` and `UO_PreDec` can only produce values in the range of the lvalue operand; we'll never have a more precise range for an lvalue than the range from its type unless it's a bit-field, in which case we *do* want to take the bit-field's width into account. So I think the current code is actually correct for those two -- looking at the range of the operand gives the right result even for bit-fields, whereas looking at the type of the operand would give a less precise result in that case.
- `UO_Minus` is [not giving the right answer](https://godbolt.org/z/8veKnrv7v). I think we should model `-expr` in exactly the same way we model `0 - expr` -- see the code for `BO_Sub` above.
- `UO_Plus` seems OK with the current code: the range of values of `+expr` is the same as the range of values of `expr`, even if a promotion happens.

In any case, I don't think we should have a `default` here that returns the width of the operand. Instead, I think the `default` should return the range for the type of `E`.


================
Comment at: clang/lib/Sema/SemaChecking.cpp:12334
+
+      LLVM_FALLTHROUGH;
+
----------------
I don't think falling through and picking up the expression range of the operand is ever correct for `~`. [For example](https://godbolt.org/z/oeE3G6r6d):

```
bool f(char c) {
  return ~c > 0x10000;
}
```

... produces a bogus tautological comparison warning. (This is not tautological: the `~` operator will map negative `char`s to `int`s in the range [2^31 - 2^7, 2^31) and non-negative `char`s to `int`s in the range [-2^31, -2^31+2^7), so this is equivalent to `c < 0`.)

Also, using `MaxWidth` here is conservatively correct but isn't precise; for example, we should still warn on:

```
bool f(bool b) {
  return ~b > 0x1'0000'0000LL;
}
```

... because `~b` always fits in an `int`, but I think `MaxWidth` here will be 64 so we won't warn. (It'd be nice to warn even on `~b > 0` but I don't think we can do that without some major changes to how `IntRange` is represented.)

The true result range of `~n` will be something like [-2^N, -2^N + 2^M) u [2^N - 2^M, 2^N) if the input is signed. `IntRange` can't represent a range with a hole in the middle like that, but it can represent [-2^N, 2^N), which seems like the least wrong answer that we can give -- that is notably also the entire range of the type of the `~` expression. We get a contiguous range [-2^N, -2^N + 2^M) if the input is known non-negative, but `IntRange` still can't represent that, and so that doesn't help us make our result any more precise than using the entire range of the type of `E`, unfortunately. So I think `UO_Not` should never look at its subexpression -- the best result we can give is `IntRange::forValueOfType(C, GetExprType(E))`, and that's what we should use.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D126960/new/

https://reviews.llvm.org/D126960



More information about the cfe-commits mailing list