[PATCH] D152435: [analyzer][CStringChecker] Adjust the invalidation operation on the super region of the destination buffer during string copy

Mon Jun 19 09:25:03 PDT 2023

OikawaKirie added a comment.

BTW, what does the `Done` checkbox mean in the code comments?

================
Comment at: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp:286-287
+                      SVal V,
+                      std::function<bool(RegionAndSymbolInvalidationTraits &,
+                                         const MemRegion *)>);

----------------
steakhal wrote:
> I'd highly suggest making this a template taking the functor that way.
> Given that we modify these functions, we should make them comply with the lower camel case naming convention.
> And their variables with upper camel case.
> I'd highly suggest making this a template taking the functor that way.

Any examples?
Do you mean
```
template <typename T>
static ProgramStateRef
InvalidateBufferAux(CheckerContext &C, ProgramStateRef state, const Expr *Ex,
                    SVal V, std::function<T> CallBack);
```

================
Comment at: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp:1054
+    SVal SizeV, QualType SizeTy) {
+  return InvalidateBufferAux(
+      C, S, BufE, BufV,
----------------
steakhal wrote:
> Shouldn't we assert that `SizeV` is some known value or at least smaller than (or equal to) the extent of the buffer?
The calls in `CStringChecker::evalCopyCommon` and `CStringChecker::evalStrcpyCommon` seem to have chances able to pass an unknown value. But I am not very sure about this.

If the `SizeV` is sure to be smaller or equal to the extent of the buffer, the `IsFirstBufInBound` check seems useless.
Maybe I need to dig deeper into the callers.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp:1075-1077
+      [](RegionAndSymbolInvalidationTraits &, const MemRegion *R) {
+        return MemRegion::FieldRegionKind == R->getKind();
+      });
----------------
steakhal wrote:
> The lambdas inline in the call look a bit messy. I would prefer declaring it and passing it as an argument separately.
> This applies to the rest of the lambdas as well.
> And their variables with upper camel case.

What about the variable names for the lambdas? Lower camel or upper camel?
I cannot find the rules for named lambdas from the guidance <https://llvm.org/docs/CodingStandards.html>.

================
Comment at: clang/test/Analysis/issue-55019.cpp:80-87
+  std::copy(p, p + 1, x.arr);
+
+  // FIXME: As we currently cannot know whether the copy overflows, the checker
+  // invalidates the entire `x` object. When the copy size through iterators
+  // can be correctly modeled, we can then update the verify direction from
+  // SymRegion to HeapSymRegion as this std::copy call never overflows and
+  // hence the pointer `x.ptr` shall not be invalidated.
----------------
> So, you say that It should be `HeapSymRegion` instead because `x.arr` shouldn't be invalidated because the `std::copy` won't modify it. Right?

Yes, but just in this test case.

The size of `x.arr` is 4 and the copy length is 1, which means the call to `std::copy` here never overflows.
Therefore, we should not invalidate the super region `x` and keep `x.ptr` still pointing to the original `HeapSymRegion` unchanged.
However, the copy length computed through iterators is not modeled currently in `CStringChecker::evalStdCopyCommon`, where `InvalidateDestinationBufferAlwaysEscapeSuperRegion` is called.
When it is modeled in the future, the function should call `InvalidateDestinationBufferBySize` instead to make the invalidation of the super region determined by the copy length to report potential leaks.

================
Comment at: clang/test/Analysis/pr22954.c:560
   clang_analyzer_eval(x263.s1[0] == 0); // expected-warning{{UNKNOWN}}
+  // expected-warning at -1{{Potential leak of memory pointed to by 'x263.s2'}}
   clang_analyzer_eval(x263.s1[1] == 0); // expected-warning{{UNKNOWN}}
----------------
steakhal wrote:
> Ah, the leak should have been reported at the end of the full-expression of `memcpy`.
> If we have this expect line here it could mislead the reader that it has anything to do with the `eval` call before.
> 
> In fact, any statement after `memcpy` would have this diagnostic.
> I have seen cases where we have a `next_line()` opaque call after such places to anchor the leak diagnostic.
> I think we can do the same here.
> Its not advised to have different, but similar looking hunks out of which some are the consequence of the semantic change we made and also have others where they are just refactors.

Do you mean I only need to update the verification direction here as you suggested above and leave others unchanged?

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D152435/new/

https://reviews.llvm.org/D152435