[PATCH] D86465: [analyzer][solver] Redesign constraint ranges data structure

Balázs Benics via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Aug 25 07:36:58 PDT 2020


steakhal added a comment.

This is a huge change, I've got pretty tired at the end of the review.
I haven't checked the test-cases too deeply.

I've found only minor typos, clarity enhancement opportunities etc. nothing serious.
It was awesome to see how this code can be improved, even becoming comprehensible.

Sorry for my nit spam if it needs further discussion - but it looks good to me.



================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:35-36
 
-  Range(const llvm::APSInt &point)
-      : std::pair<const llvm::APSInt *, const llvm::APSInt *>(&point, &point) {}
+  Range(const llvm::APSInt &Point)
+      : std::pair<const llvm::APSInt *, const llvm::APSInt *>(&Point, &Point) {}
 
----------------
I don't think we should inherit from standard type. It should be a private member instead.
https://quuxplusone.github.io/blog/2018/12/11/dont-inherit-from-std-types/#never-ever-inherit-from-any-std


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:51
   }
-};
+  void print(raw_ostream &OS) const;
 
----------------
We have `dump` methods all over the place. Except this.
I propose to rename this to `dump`.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:53-55
+  // In order to keep non-overlapping ranges sorted, we can compare only From
+  // points.
+  inline bool operator<(const Range &RHS) const { return From() < RHS.From(); }
----------------
It's a good practice to define comparison operators as //friend// functions inline.
Even if we don't rely on implicit conversions.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:110
+  iterator end() const { return Impl->end(); }
+  size_t size() const { return Impl->size(); }
+
----------------
I think it should be `std::size_t`.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:118-123
+    /// Create a new set with all ranges from both LHS and RHS.
+    /// Possible intersections are not checked here.
+    ///
+    /// Complexity: O(N + M)
+    ///             where N = size(LHS), M = size(RHS)
+    RangeSet add(RangeSet LHS, RangeSet RHS);
----------------
What about `merge` or `combine`? I know that `union` is a keyword thus we can not use.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:135
+    /// @{
+    RangeSet getSet(Range Origin);
+    RangeSet getSet(const llvm::APSInt &From, const llvm::APSInt &To) {
----------------
NoQ wrote:
> "But what about `setGet()`???" - some user, probably :)
Why don't we call this `createSetOf`?
And `createEmptySet`.
I know that we don't create the empty set, but then what does a //factory// if not create stuff?


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:149-160
+    /// Intersect the given set with the closed range [Lower, Upper].
+    ///
+    /// Unlike the Range type, this range uses modular arithmetic, corresponding
+    /// to the common treatment of C integer overflow. Thus, if the Lower bound
+    /// is greater than the Upper bound, the range is taken to wrap around. This
+    /// is equivalent to taking the intersection with the two ranges [Min,
+    /// Upper] and [Lower, Max], or, alternatively, /removing/ all integers
----------------
I think an example would be awesome here.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:170-174
+    /// Delete the given point from the range set.
+    ///
+    /// Complexity: O(N)
+    ///             where N = size(From)
+    RangeSet deletePoint(const llvm::APSInt &Point, RangeSet From);
----------------
Why does this operation take `O(N)` time? Shouldn't be `O(logN)` instead?


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:190-193
+    /// And it makes us to separate the range
+    /// like [MIN, N] to [MIN, MIN] U [-N, MAX].
+    /// For instance, whole range is {-128..127} and subrange is [-128,-126],
+    /// thus [-128,-127,-126,...] negates to [-128,...,126,127].
----------------



================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:195-196
+    ///
+    /// Negate restores disrupted ranges on bounds,
+    /// e.g. [MIN, B] => [MIN, MIN] U [-B, MAX] => [MIN, B].
+    ///
----------------



================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:203
+  private:
+    RangeSet makePersistent(ContainerType &&From);
+    ContainerType *construct(ContainerType &&From);
----------------
As commented later, this function worth to be documented.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:207-208
+
+    RangeSet intersect(const RangeSet::ContainerType &LHS,
+                       const RangeSet::ContainerType &RHS);
+
----------------
Why do we need to explicitly spell the  `RangeSet` here?


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:210-211
+
+    // Many operations inlcude producing new APSInt values and that's why
+    // we need this factory.
+    BasicValueFactory &ValueFactory;
----------------



================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:213-215
+    // Allocator for all the created containers.
+    // Containers might own their own memory and that's why it is specific
+    // for the type, so it calls containter destructors upon deletion.
----------------



================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:222
+
+    friend class RangeSet;
+  };
----------------
`class Factory` is a nested class of `RangeSet`, so it can already access all private parts.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:225-228
+  RangeSet(const RangeSet &) = default;
+  RangeSet &operator=(const RangeSet &) = default;
+  RangeSet(RangeSet &&) = default;
+  RangeSet &operator=(RangeSet &&) = default;
----------------
If this is the Rule of 5, where is the defaulted destructor?


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:252-259
   /// Get a minimal value covered by the ranges in the set
+  ///
+  /// Complexity: O(1)
   const llvm::APSInt &getMinValue() const;
   /// Get a maximal value covered by the ranges in the set
+  ///
+  /// Complexity: O(1)
----------------
Finally, we got this, awesome!


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:261
 
-private:
-  void IntersectInRange(BasicValueFactory &BV, Factory &F,
-                        const llvm::APSInt &Lower, const llvm::APSInt &Upper,
-                        PrimRangeSet &newRanges, PrimRangeSet::iterator &i,
-                        PrimRangeSet::iterator &e) const;
+  /// Test whether the given point is contained in any of the ranges.
+  ///
----------------



================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:269
 
-public:
-  RangeSet Intersect(BasicValueFactory &BV, Factory &F, llvm::APSInt Lower,
-                     llvm::APSInt Upper) const;
-  RangeSet Intersect(BasicValueFactory &BV, Factory &F,
-                     const RangeSet &Other) const;
-  RangeSet Negate(BasicValueFactory &BV, Factory &F) const;
-  RangeSet Delete(BasicValueFactory &BV, Factory &F,
-                  const llvm::APSInt &Point) const;
+  bool operator==(const RangeSet &Other) const { return *Impl == *Other.Impl; }
 
----------------
If we implement equality, we should also support inequality.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:272-273
+private:
+  RangeSet(ContainerType *RawContainer) : Impl(RawContainer) {}
+  RangeSet(UnderlyingType Ptr) : Impl(Ptr) {}
 
----------------
Missing `explicit`.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:275-276
 
-  bool operator==(const RangeSet &other) const {
-    return ranges == other.ranges;
-  }
+  bool pin(llvm::APSInt &Lower, llvm::APSInt &Upper) const;
+  bool pin(llvm::APSInt &Point) const;
+  bool containsImpl(llvm::APSInt &Point) const;
----------------
TBH, I don't even know what the `pin` function does.
Should we consider a more descriptive name instead?
Maybe some docs would come handy too.


================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:277
+  bool pin(llvm::APSInt &Point) const;
+  bool containsImpl(llvm::APSInt &Point) const;
+
----------------
If it's a //read-only// operation why don't we take by value? Just like `contains` does.
If it has side-effects which modify the Point, why is that not documented here?


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:146-149
+RangeSet::ContainerType *RangeSet::Factory::construct(ContainerType &&From) {
+  void *Buffer = Arena.Allocate();
+  return new (Buffer) ContainerType(std::move(From));
+}
----------------
Off-topic: Why do the Allocate and the similar //allocator// functions return `T*` if there is NO living object there (yet).
There will be, only if the placement constructor called...
I highly agree with you using `void *` there.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:177-178
+
+  --It;
+  return It->Includes(Point);
+}
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:181-188
+bool RangeSet::pin(llvm::APSInt &Point) const {
+  APSIntType Type(getMinValue());
+  if (Type.testInRange(Point, true) != APSIntType::RTR_Within)
+    return false;
+
+  Type.apply(Point);
+  return true;
----------------
Previously we had an early `empty` check here.
Why don't we have that now?
Same question for the other overload.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:281
+    //
+    // Shortcut: check that there is even a possibility of the intersection
+    //           by checking the two following situations:
----------------
vsavchenko wrote:
> NoQ wrote:
> > Thanks a lot for adding those comments. These methods were confusing as heck.
> 😊 
I agree. These are black magic. xD


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:322-325
+  auto swap = [&First, &FirstEnd, &Second, &SecondEnd]() {
+    std::swap(First, Second);
+    std::swap(FirstEnd, SecondEnd);
+  };
----------------
It could definitely bear a longer name.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:367-370
+      if (Second->To() > First->To())
+        // Here we make a decision to keep First as the "longer"
+        // range.
+        swap();
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:375
+      //    ---- First      ]-------------------->
+      //    ---- Second ]--[  ++Second ---------->
+      //
----------------
It's unfortunate to refer to the next subrange by the `++Second`.
Since `operator++` has a //side-effect// I would recommend the `Second+1` instead.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:384
+      Result.push_back(Range(IntersectionStart, Second->To()));
+      ++Second;
+      // ...and that the invariant will hold for a valid ++Second
----------------
It's probably a personal taste though.
I think better to use the more verbose way incrementing an iterator if you are in the body of the loop.
It catches more attention.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:412-417
+RangeSet RangeSet::Factory::negate(RangeSet What) {
+  if (What.isEmpty())
+    return getEmptySet();
+
+  ContainerType Result;
+  Result.reserve(What.size());
----------------
`What.size()` seems not justified here.
As you described previously, there is a case when the resulting range will have one more range then the previous one had.
`negate([MIN, X]` => `[MIN,MIN] U [-X,MAX]`


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:431
+  if (From == MIN) {
+    // If [From, To] are [MIN, MAX], then just return the same [MIN, MAX].
+    if (To == MAX) {
----------------



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:498
   os << "{ ";
   for (iterator i = begin(), e = end(); i != e; ++i) {
     if (isFirst)
----------------
Can we use range-based for loops?


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86465/new/

https://reviews.llvm.org/D86465



More information about the cfe-commits mailing list