[PATCH] D103094: [analyzer] Implemented RangeSet::Factory::castTo function to perform promotions, truncations and conversions

Tue Apr 19 06:27:36 PDT 2022

martong accepted this revision.
martong added a comment.
This revision is now accepted and ready to land.

Thanks, LGTM! With minor revisions.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:749
+// E.g. char{1,3,5,127} -> uint{1,3,5,127}
+// Interrupt the first phase and go to second one when casted values start
+// go in descending order. That means that we crossed over the middle of
----------------

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:687
+
+  if (IsConversion && (!IsPromotion || !What.isUnsigned()))
+    return makePersistent(convertTo(What, Ty));
----------------
ASDenysPetrov wrote:
> martong wrote:
> > Could you please explain why do we need the `|| !What.isUnsigned()` part of the condition?
> > 
> > This would make much more sense to me:
> > ```
> > if (IsConversion && !IsPromotion)
> >     return makePersistent(convertTo(What, Ty));
> > ```
> Look. Here we handle 2 cases:
> 
>   - `IsConversion && !IsPromotion`. In this case we handle changing a sign with same bitwidth: char -> uchar, uint -> int. Here we convert //negatives //to //positives //and //positives which is out of range// to //negatives//. We use `convertTo` function for that.
>  
>   - `IsConversion && IsPromotion && !What.isUnsigned()`. In this case we handle changing a sign from signeds to unsigneds with higher bitwidth: char -> uint, int-> uint64. The point is that we also need convert //negatives //to //positives //and use `convertTo` function as well. For example, we don't need such a convertion when converting //unsigned //to //signed with higher bitwidth//, because all the values of //unsigned //is valid for the such //signed//.
>      
> 
> 
> 
Ok, makes sense. Could you please attach your explanation as a comment then into the code?

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:690
+
+  // Promotion from unsigneds to signeds/unsigneds left.
+
----------------
ASDenysPetrov wrote:
> ASDenysPetrov wrote:
> > martong wrote:
> > > martong wrote:
> > > > I think it would be more consistent with the other operations (truncate and convert) if we had this logic in its own separate function: `promoteTo`. And this function should be called only if `isPromotion` is set (?)
> > > This comment is confusing, since we can get here also if
> > > `isConversion` is false and
> > > `isPromotion` is false 
> > Nothing confusing is here.
> > We have 7 main cases:
> > NOOP: **u -> u, s -> s**
> > Conversion: **u -> s, s -> u**
> > Truncation: **U-> u, S -> s**
> > Promotion: `u->U`, `s -> S`
> > Truncation + Conversion: **U -> s, S -> u**
> > Promotion + Conversion: **s -> U**, `u -> S`
> > 
> > As you can see, we've handled all the **bolds** above this line .
> > So only promotions from //unsigneds// left. That means, `isPromotion` never should be `false` here. We could add an assertion here if you want.
> That makes sense. I'll do.
> So only promotions from unsigneds left. That means, isPromotion never should be false here. We could add an assertion here if you want.

Yeah, I think that would be useful, to enforce that the precondition for `promoteTo` holds all the time (even after future changes).

================
Comment at: clang/unittests/StaticAnalyzer/RangeSetTest.cpp:827
+  if (is_signed_v<F>) {
+    this->template checkCastTo<F, T>({{MIN, MIN}, {MAX, MAX}}, {{MAX, MIN}});
+    this->template checkCastTo<F, T>({{MID, MID}, {MAX, MAX}}, {{MAX, MID}});
----------------
ASDenysPetrov wrote:
> martong wrote:
> > I am getting lost. Why don't you check against `ToMIN` and `ToMAX` here? Could you explain e.g. with int16->int8?
> > 
> > It is confusing that at many places you test against `MIN`, `MAX`, `A`, ... and the conversion is happening automatically by the `checkCastTo` template. Would it be more explicit to use everywhere `ToMIN`, `ToA`, `ToB`, ... and check against them?
> We can't use ToMIN, ToMAX, ... everywhere. That would be incorrect:
> **int16(-32768, 32767)** -> **int8(-128, 127)**, aka `(MIN, MAX) -> (ToMIN, ToMAX) // OK`.
> **int16(-32768, -32768)** -> **int8(-128, -128)**, aka `(MIN, MIN) -> (ToMIN, ToMIN) // NOK`.
> **int16(-32768,-32768)** -> **int8(0, 0)**, aka `(MIN, MIN) -> ((int8)MIN, (int8)MIN) // OK`.
> 
Okay makes sense, thanks for the explanation. Nevertheless, I think this explanatory comment would be useful to have in the code.

================
Comment at: clang/unittests/StaticAnalyzer/RangeSetTest.cpp:874-879
+  constexpr auto FromA = TV::FromA;
+  constexpr auto ToA = TV::ToA;
+  constexpr auto FromB = TV::FromB;
+  constexpr auto ToB = TV::ToB;
+  this->template checkCastTo<F, T>({{FromA, ToA}, {FromB, ToB}},
+                                   {{FromA, ToA}});
----------------
ASDenysPetrov wrote:
> martong wrote:
> > Could you please elaborate what do we test here?
> E.g. 
> ```
>                  RangeA                   U                   RangeB
> (0000'1111'0000'0100, 0000'1111'0000'0111)U(1111'0000'0000'0100, 1111'0000'0000'0111)
> truncates to
>                  RangeC                   U                   RangeC
> (          0000'0100,           0000'0111)U(          0000'0100,           0000'0111)
> As you can see, we got two equal truncated ranges.
>                            RangeC
> Then unite them to (0000'0100, 0000'0111).
> ```
Ok.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D103094/new/

https://reviews.llvm.org/D103094