[PATCH] D144977: [analyzer] Fix of the initialization list parsing.

Domján Dániel via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Thu Mar 2 16:27:37 PST 2023


isuckatcs added inline comments.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp:432
+      const LocationContext *LCtx = C.getLocationContext();
+      SVal SrcVal = State->getSVal(Buffer.Expression, LCtx);
+      QualType SourceValType = SrcVal.getType(C.getASTContext());
----------------
I did some digging into the other parts of the checker and I'm not sure that we guarantee that every time
 `AccessKind::read` is performed, `Buffer` will be the source and not the destination. We just check
an arbitrary element in an arbitrary buffer.

For example if you look at `CStringChecker::evalMemcmp()`, it has the following content:
```lang=c++
  // int memcmp(const void *s1, const void *s2, size_t n);
  CurrentFunctionDescription = "memory comparison function";

  AnyArgExpr Left = {CE->getArg(0), 0};
  AnyArgExpr Right = {CE->getArg(1), 1};
  SizeArgExpr Size = {CE->getArg(2), 2};
  
  ...

   State = CheckBufferAccess(C, State, Right, Size, AccessKind::read, CK);
   State = CheckBufferAccess(C, State, Left, Size, AccessKind::read, CK);
```

`CStringChecker::CheckBufferAccess()` will eventually call your code and in the first case, you will deduce the type
based on the destination and not the source. This logic might needs to be moved to a higher level function unless
you can guarantee that always the correct type will be deduced. Or if the intention is not to deduce some information
about the source array in a `memcpy` like function, it might be clearer to rename this variable to something that 
doesn't have either src or dst in it.


================
Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:564-565
 
-  SVal getBindingForElement(RegionBindingsConstRef B, const ElementRegion *R);
+  SVal getBindingForElement(RegionBindingsConstRef B, const ElementRegion *R,
+                            QualType T = QualType());
 
----------------
earnol wrote:
> NoQ wrote:
> > There's a bit of a confusing tradition here, mostly documented in my unsorted long rants in earlier patches, let me try to explain.
> > 
> > `RegionStore` was built to recognize that type punning is a thing. You can write a value while treating the target location/region R as if it's supposed to hold a value of type T₁ and then load the value by treating the same memory location as if it holds a value of type T₂≠T₁. You can achieve it by having unions or by reinterpret-casting pointers. RegionStore needs to handle this reasonably well, no matter what T₁ and T₂ are. Even though strict aliasing rules put some restrictions on what T₁ and T₂ could be, the analyzer is still required to work correctly when the programmer actively relies on `-fno-strict-aliasing`.
> > 
> > On top of that, every `MemRegion` R can have a type T(R) associated with it (accessed as `R->getValueType()`). This was probably a bad idea from the start, for at least three reasons:
> > - Again, type punning exists. Just because the region has a type, doesn't mean it holds a value of that type.
> > - In many cases we deal with regions with no associated type. For example, an unknown (symbolic) pointer of type `void *` definitely points to a certain memory region (we call it `SymbolicRegion`), but the type of the object in that region is most certainly not `void`.
> > - Finally, polymorphism: C++ pointers to base class type may actually point to objects of any derived class type. Not only the specific derived type is often unknown, but also the true derived type may not be necessarily available in the current translation unit, so we often don't have the AST to even describe it.
> > 
> > This makes the type of `MemRegion` largely immaterial and hard to rely upon. Which is why `State->getSVal(R)` accepts an optional parameter T₂ which may be different from both T₁ and T(R).
> > 
> > However, after some back-and-forth bugfixing, we've settled on agreeing that when T(R) exists, it doesn't make sense to pass any T₂ into `State->getSVal(R, T₂)` other than `T₂=T(R)`. Basically, if R already carries a type, then you don't have a good reason to pass a different type as T₂, given that you can always pass a different region R' instead, that represents the same memory location as R but has the desired value type T₂. So T₂ is only required when it's fundamentally impossible to discover the type from the region. So in order to eliminate the confusion, `getSVal()` tries to get rid of the extra parameter `T₂` as early as possible, and have a single source of truth going further down the call stack.
> > 
> > I honestly no longer think this is a healthy strategy long-term. Ideally I think we should get rid of region types entirely; they cause more problems than they solve. But that's a much bigger change, and I'm worried that a partial step in this direction, not supported by any strategic vision, will only multiply confusion.
> > 
> > So, long story short, given that `ElementRegion`s are always typed, I'm wondering if it's easier to simply recast the region in your case as well, instead of reintroducing such duplication of types.
> You are right.
> Let me explain the situation. The initialization list does not always correct and full initialize variable in it's entirety. This is the issue i want to address. 
> You may say having `T(R)` is good enough and in case of functions like `memcpy` it exists always, but the problem i want to address is whether the original memory region was fully and properly initialized or not. 
> The proposed solution is neither full nor complete, it just slightly improve situation when we have structs and array intertwined.
> So in some cases existence `T(R)` is not something what we want  as `T₂` is indeed different.
> Please also allow me amend the statement: `ElementRegions` are always typed. While they are do typed they are not always correctly typed.
> For example what type will have following initialization list: `{ {1, 2}, {3,4} }`?
> It can be array of 2 structures, or just 1 structure with 2 structures inside or it can be 2-dimentsional array. For this purpose we need to know true `T₂` while constructing and checking relevant SVal.
> 
>the problem i want to address is whether the original memory region was fully and properly initialized or not

I think you can only do this outside `RegionStore`. The problem is that `RegionStore` only knows about the beginning of
a region, but doesn't know about the size of it. If I understand correctly you want to bind the type to the region so that 
you can deduce whether a specific region is fully initialized or not based on the types, however you can do this outside
`RegionStore` in any checker for the mentioned arrays and structs.


Consider the following example:
```
int arr[5] = {1, 2, 3, 4, 5};
```
You can see that the array has 5 elements and the initializer list also has 5 elements, so the array is initialized properly.

What you might try doing here is that you create like a `set`  that tracks every region that has been initialized and use it
as a source of information to silence a false positive.

E.g.:
```lang=c++
void false_positive() {
  int src[] = {1, 2, 3, 4};                               <- add `src` to the set, so you mark it initialized
  int dst[5] = {0};                                       <- add `dst` to the set, so you mark it initialized
  memcpy(dst, src, 4 * sizeof(int)); // false-positive:
  // The 'src' buffer was correctly initialized, yet we cannot conclude
  // that since the analyzer could not see a direct initialization of the
  // very last byte of the source buffer.
  ^ Analyzer fails to recognize that the buffer is initialized, so it attempts to report a warning. Before you let that happen you look up `src` in the set that contains initialized buffers and decide if the warning is valid or not.
}
```

You might need to store more information, like the length of the array, but this approach might still improve the false positive rate. `ASTMatcher`s will be helpful if you think it worths pursuing. 

The same approach can be used for structs but then you compare init list elements with the number of fields


================
Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1891
+        ElemExpr = IL->getInit(Idx);
+        std::optional<SVal> ConstVal = svalBuilder.getConstantVal(ElemExpr);
+        // if there is no value create a zero one
----------------
earnol wrote:
> isuckatcs wrote:
> > This crashes if `ElemExpr` is a `nullptr`.
> Accepted. `nullptr` should never pop up with correctly constructed initialization list in the line 1890, but nobody promised me to have a correct initialization list here.
You checked it on line 1848 too, that's why I thought I point this out.


================
Comment at: clang/lib/StaticAnalyzer/Core/RegionStore.cpp:1914
+          }
+          RecIter++;
+          continue;
----------------
earnol wrote:
> isuckatcs wrote:
> > Consider moving this into the for loop to avoid confusion.
> If i am to do it i'll need to move line 1883 into the for loop as well for consistency making for loop bulky and cumbersome. Do you really think it will make code easier to understand?
> I propose to move iterator advance somewhere line 1890 so all `for` associated operation will be close to the for operator. How about it?
We do something similar to this loop in `ExprEngine::VisitLambdaExpr()`, I think it is perfectly readable but you can leave it like this if you want.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D144977/new/

https://reviews.llvm.org/D144977



More information about the cfe-commits mailing list