[PATCH] D99797: [analyzer] Implemented RangeSet::Factory::unite function to handle intersections and adjacency

Valeriy Savchenko via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Apr 8 04:06:49 PDT 2021


vsavchenko added a comment.

Well, that is a nice exercise for "two pointer" problems, but can we please talk about the actual use case for it?



================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:136
+
+RangeSet RangeSet::Factory::unite(RangeSet LHS, RangeSet RHS) {
+  if (LHS.isEmpty())
----------------
I'd prefer `merge`


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:217
+
+  while (BIt1 || BIt2) {
+
----------------
This algorithm is indeed `O(N +M)`, good job!
But let me get picky again 😅

It looks like we do too many comparisons in this loop.  As I said earlier, constant factor is the key here and while this iterator idea is good it is a tradeoff. There is a piece of knowledge that would've been obvious with regular iterators, now is a run-time unknown that we constantly need to check (`isTo` and `isFrom`).

What we are trying to achieve here is not harder than what we do in `intersect` and there it is way less comparisons.  While iterating through sets we can keep a set of invariants, so we don't need to re-check something that we already know.

Here is a sketch for the algorithm:

```
{
  assert(First != FirstEnd && Second != SecondEnd);
  // We have && here because one of them reaches its end,
  // we should not check for it again and simply add the rest
  // of the other set.
  while (true) {
    // We need to find the beginning of the next range to add
    // First should be the iterator which To is a candidate to be
    // an and for the merged range.
    if (First.From() > Second.From()) {
      swap();
    }

    auto From = First.From();

    // That's it, we found our start, and we need to go through
    // other ranges and look for the end.
    // After this point, First.From() shouldn'y be accessed.

    while (true) {
      // We can compress all the checks into just one.
      // This essentially means that Second should not get merged with First.
      if (Second.From() > First.To() + 1) {
        break;
      }

      if (Second.From() > First.From()) {
        // First is maintained as a candidate for the end.
        swap();
      }

      // At this point we know that Second lives fully inside
      // of the new range and we can skip it.
      ++Second;

      // If we have nothing else in the second set...
      if (Second == SecondEnd) {
        // ...let's finish the current range first...
        Result.emplace_back(From, First.To());
        while (++First != FirstEnd) {
          // ...and copy the rest of the ranges
          Result.push_back(*First);
        }
        // The range is ready.
        return makePersistent(Result);
      }
    };

    // Second is outside of the range, and we can
    // safely add a new range.
    Result.emplace_back(From, First.To());
    ++First;

    // First set can be over at this point and we should...
    if (First == FirstEnd) {
      // ...copy the rest if the second set's ranges.
      while (Second != SecondEnd) {
        Result.push_back(*(Second++));
      }
      // Nothing left to do.
      return makePersistent(Result);
    }
  };

  // No way for us to get here!
  llvm_unreachable("...");
}
```

It's trickier in the way we end things than the intersection because we still need to add the rest of the other set.


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:279-293
+  auto It = std::lower_bound(
+      B, E, Range(Point), [&Max, &One](const Range &OrigR, const Range &R) {
+        return OrigR.To() != Max && (OrigR.To() + One) < R.From();
+      });
+
+  if (It != E && It->Includes(Point))
+    return Original;
----------------
I don't see any practical reasons to keep this version over the more generic `RangeSet` + `RangeSet`


================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:321-335
+
+  const APSInt &F = (It == E) ? R.From() : std::min(R.From(), It->From());
+
+  // Find a right part of disjoint ranges from the new range.
+  It = std::lower_bound(
+      It, E, R, [&One, &Max](const Range &OrigR, const Range &R) {
+        return OrigR.From() == Max || (OrigR.From() - One) <= R.To();
----------------
Same here, it is just way more code to maintain.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D99797/new/

https://reviews.llvm.org/D99797



More information about the cfe-commits mailing list