[PATCH] D124658: [analyzer] Canonicalize SymIntExpr so the RHS is positive when possible

Fri Apr 29 07:14:00 PDT 2022

tomasz-kaminski-sonarsource marked 2 inline comments as done.
tomasz-kaminski-sonarsource added a comment.

> Could you please elaborate on this? I understand that you'd like to simplify certain binary operations by merging the RHS and the operand, however, I don't see what are the empirical problems that this patch supposed to fix. E.g. did you encounter a crash without the fix, or was it the mentioned infeasible state (`(l - 1000) > 0`) that caused a false positive?

Yes, the patch is aiming to fix false positives raised from unfeasible paths, that were guarded by conditions that encountered this problem.

> If that is the case, I believe the huge cast patch <https://reviews.llvm.org/D103096> is going to solve it. Could you please check if those infeasible cases are solved by the mentioned D103096 <https://reviews.llvm.org/D103096> (you have to set `support-symbolic-integer-casts=true`) ?

Indeed all false positives that I am fixing with this patch would be also fixed by proper modeling of the cast of the integers. 
However this patch:

- does not in any way affect the correctness of the symbol values, or the upcoming patch
- fixes them even without that `support-symbolic-integer-casts` enabled, and with nearly zero performance impact (two additional comparisons)

================
Comment at: clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:204
+    // subtraction/addition of the negated value.
+    if (!RHS.isNegative()) {
+      ConvertedRHS = &BasicVals.Convert(resultTy, RHS);
----------------
steakhal wrote:
> I would rather swap these branches though, to leave the default case (aka. this) to the end.
I folded the `RHS.isNegative()` into the if for the `BinaryOperator::isAssociative(op)`, as same conversion is performed in final else branch.

================
Comment at: clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:212-219
+      llvm::APSInt ConvertedRHSValue = resultIntTy.convert(RHS);
+      // Check if the negation of the RHS is representable,
+      // i.e., the resultTy is signed, and it is not the lowest
+      // representable negative value.
+      if (ConvertedRHSValue > resultIntTy.getMinValue()) {
+        ConvertedRHS = &BasicVals.getValue(-ConvertedRHSValue);
+        op = (op == BO_Add) ? BO_Sub : BO_Add;
----------------
steakhal wrote:
> Somehow I miss a check for signedness here.
> Why do you think it would be only triggered for signed types?
> 
> I have a guess, that since we already handled `x +-0`, SymIntExprs like `x - (-0)` cannot exist here, thus cannot trigger this condition spuriously. I cannot think of any ther example that could cause this misbehaving. So in that sense `ConvertedRHSValue > resultIntTy.getMinValue()` implies *at this place* that `ConvertedRHSValue.isSigned()`.
> I would rather see this redundant check here to make the correctness reasoning local though.
The integer representation does not have negative zeros (the standard and clang assume two's complement). However, this condition does need to check for the signedness of the types. What I mean is that if the `RHS` is negative, but `ConvertedRHSValue` the branch will trigger and we will change `x - INT_MIN` to `x + (INT_MAX + 1)U` which is ok, as a negation of `INT_MIN` is representable as an unsigned type of same or lager bit with.

================
Comment at: clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:212-219
+      llvm::APSInt ConvertedRHSValue = resultIntTy.convert(RHS);
+      // Check if the negation of the RHS is representable,
+      // i.e., the resultTy is signed, and it is not the lowest
+      // representable negative value.
+      if (ConvertedRHSValue > resultIntTy.getMinValue()) {
+        ConvertedRHS = &BasicVals.getValue(-ConvertedRHSValue);
+        op = (op == BO_Add) ? BO_Sub : BO_Add;
----------------
tomasz-kaminski-sonarsource wrote:
> steakhal wrote:
> > Somehow I miss a check for signedness here.
> > Why do you think it would be only triggered for signed types?
> > 
> > I have a guess, that since we already handled `x +-0`, SymIntExprs like `x - (-0)` cannot exist here, thus cannot trigger this condition spuriously. I cannot think of any ther example that could cause this misbehaving. So in that sense `ConvertedRHSValue > resultIntTy.getMinValue()` implies *at this place* that `ConvertedRHSValue.isSigned()`.
> > I would rather see this redundant check here to make the correctness reasoning local though.
> The integer representation does not have negative zeros (the standard and clang assume two's complement). However, this condition does need to check for the signedness of the types. What I mean is that if the `RHS` is negative, but `ConvertedRHSValue` the branch will trigger and we will change `x - INT_MIN` to `x + (INT_MAX + 1)U` which is ok, as a negation of `INT_MIN` is representable as an unsigned type of same or lager bit with.
> 
However, I was not able to reach this point with `RHS` being signed, and `resultTy` being unsigned. Any hints how this could be done?

================
Comment at: clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:661-667
+              // If the op and lop agrees, then we just need to
+              // sum the constants. Otherwise, we change to operation
+              // type if substraction would produce negative value
+              // (and cause overflow for unsigned integers),
+              // as consequence x+1U-10 produces x-9U, instead
+              // of x+4294967287U, that would be produced without this
+              // additional check.
----------------
steakhal wrote:
> It feels like this is a completely different problem compared to the //simplification of X - (-N) to X + N//.
> If this is the case, I would rather split that part into a separate patch instead.
Without the change, this code would produce X -(-N) cases, that would be detected for signed numbers. This would be handled for signed integers later by the simplification but would lead to double conversion. However, if `IntType` is is unsigned and `first` < `last` is true, then we will produce `x +` extremely large number, which would not be later simplified to `X - (last - first)`.

In other words, without applying this change, the normalization of x -(-N) would provide no benefit. And only this change without the initialial normalization would fail in case of:
```
  i = x + (-1);
  l = i - 10U;
```

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D124658/new/

https://reviews.llvm.org/D124658