[PATCH] D86465: [analyzer][solver] Redesign constraint ranges data structure

Wed Aug 26 05:49:40 PDT 2020

steakhal added inline comments.

================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:118-123
+    /// Create a new set with all ranges from both LHS and RHS.
+    /// Possible intersections are not checked here.
+    ///
+    /// Complexity: O(N + M)
+    ///             where N = size(LHS), M = size(RHS)
+    RangeSet add(RangeSet LHS, RangeSet RHS);
----------------
vsavchenko wrote:
> steakhal wrote:
> > What about `merge` or `combine`? I know that `union` is a keyword thus we can not use.
> Maybe, but the real `union` function is not present.  And I think that in the original code it is `add` to hint that it's not really a union and nothing complex is done here.  Maybe something like `mergeUnchecked` would work.
Now I get it. Let's keep it as-is.
Might worth highlighting the `**with all ranges from both**` part though.

================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:170-174
+    /// Delete the given point from the range set.
+    ///
+    /// Complexity: O(N)
+    ///             where N = size(From)
+    RangeSet deletePoint(const llvm::APSInt &Point, RangeSet From);
----------------
vsavchenko wrote:
> steakhal wrote:
> > Why does this operation take `O(N)` time? Shouldn't be `O(logN)` instead?
> We still create a copy of an old vector to maintain persistence, that's why it's `O(N)`.
Thanks. BTW I just forgot about this comment :D

================
Comment at: clang/include/clang/StaticAnalyzer/Core/PathSensitive/RangedConstraintManager.h:272-273
+private:
+  RangeSet(ContainerType *RawContainer) : Impl(RawContainer) {}
+  RangeSet(UnderlyingType Ptr) : Impl(Ptr) {}

----------------
vsavchenko wrote:
> steakhal wrote:
> > Missing `explicit`.
> More like missing `/* implicit */` because it is intentional
It doesn't have a too long identifier.
The users can always refer to it `auto R = RangeSet(...)`, so we still don't repeat ourselves.
Do you have any particular counterexample?
Probably the tests will become slightly more bloated but eh. whatever.

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:177-178
+
+  --It;
+  return It->Includes(Point);
+}
----------------
vsavchenko wrote:
> steakhal wrote:
> > 
> I don't really see the benefits of one way over another and as I said in another comment, I don't really like to pay for an extra copy with iterators as just a rule of thumb.
Iterators supposed to be cheap to copy. That is why we take them by value all over the place.
IMO it should boil down to the very same codegen in both cases. https://godbolt.org/z/8x1zMh

To prove it, I disassembled the following functions in release clang build:
```lang=C++
// _Z21raw_rangeset_iteratorN5clang4ento8RangeSetE
const llvm::APSInt &raw_rangeset_iterator(RangeSet What) {
  RangeSet::iterator It = What.begin();
  --It;
  return It->To();
}
// _Z26std_prev_rangeset_iteratorN5clang4ento8RangeSetE
const llvm::APSInt &std_prev_rangeset_iterator(RangeSet What) {
  RangeSet::iterator It = What.begin();
  return std::prev(It)->To();
}
```

```lang=asm
(gdb) disassemble _Z21raw_rangeset_iteratorN5clang4ento8RangeSetE
Dump of assembler code for function raw_rangeset_iterator(clang::ento::RangeSet):
   0x0000000006102b10 <+0>:	mov    (%rdi),%rax
   0x0000000006102b13 <+3>:	mov    -0x8(%rax),%rax
   0x0000000006102b17 <+7>:	retq   
End of assembler dump.
```

```lang=asm
(gdb) disassemble _Z26std_prev_rangeset_iteratorN5clang4ento8RangeSetE
Dump of assembler code for function _Z26std_prev_rangeset_iteratorN5clang4ento8RangeSetE:
   0x00000000061040d0 <+0>:	mov    (%rdi),%rax
   0x00000000061040d3 <+3>:	mov    -0x8(%rax),%rax
   0x00000000061040d7 <+7>:	retq   
End of assembler dump.
```

================
Comment at: clang/lib/StaticAnalyzer/Core/RangeConstraintManager.cpp:322-325
+  auto swap = [&First, &FirstEnd, &Second, &SecondEnd]() {
+    std::swap(First, Second);
+    std::swap(FirstEnd, SecondEnd);
+  };
----------------
vsavchenko wrote:
> steakhal wrote:
> > It could definitely bear a longer name.
> I think that there is nothing wrong in spelling out `std::swap` twice.
> And do we have a naming convention for lambdas, is it a variable or a function from a naming perspective?
That's a good question :)
I would say that its a variable, since you can mark it `const`, otherwise you could overwrite it. xD But that's a different story I think.
About the `swap` thingie, its a good practice to respect ADL for functions which know to be used via ADL.
Even if we don't depend on ADL in this particular case.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86465/new/

https://reviews.llvm.org/D86465