[PATCH] D86445: [analyzer][RFC] Simplify MetadataSymbol representation and resolve a CStringChecker FIXME

Gabor Marton via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Tue Sep 8 05:37:25 PDT 2020


martong added a comment.



> But if the string is invalidated (or the new length is completely unknown), **do not remove** the length from the state; instead, set it to SymbolConjured.

Where exactly do you implement the above?

> When strlen(R) is used for the first time on a region R, produce `$meta<size_t R>` as the answer, but *do not store* it in the string length map.

And this?



================
Comment at: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp:2379
+  // Overwrite the associated cstring length values of the invalidated regions.
+  for (const auto &Entry : Entries) {
+    const MemRegion *MR = Entry.first;
----------------
It's just a tiny nit. But, perhaps we could come up with a more meaningful name instead of `Entry` and `Entries`. Maybe `Length` ?


================
Comment at: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp:2384-2385
     if (SuperRegions.count(MR)) {
       Entries = F.remove(Entries, MR);
+      Entries = F.add(Entries, MR, UnknownVal());
       continue;
----------------
Umm, these two lines are quite disturbing after each other. Seems like first we remove `MR` then we add `MR` again, so, this must not be a noop, right? But then what is happening here exactly? Some comments around here in the code would help.


================
Comment at: clang/lib/StaticAnalyzer/Checkers/CStringChecker.cpp:2429
     if (SymbolRef Sym = Len.getAsSymbol()) {
       if (SR.isDead(Sym))
+        Entries = F.remove(Entries, MR);
----------------
steakhal wrote:
> NoQ wrote:
> > Ok, this doesn't look correct (looks like it never was). It's liveness of the region that's important to us, not liveness of the symbol. Because it's the liveness of the region that causes the program to be able to access the map entry.
> Let's say we have this:
> ```lang=C++
> void foo() {
>   char *p = malloc(12);
>   // strlen(p); // no-leak if strlen called, leak warning otherwise...
> } // expected-warning {{leak}}
> ```
> If we were marking the region live here, we would miss the `leak` warning. That warning is only triggered if the region of the `p` becomes dead. Which will never become dead if we have a cstring length symbol associated to that region.
> I came to this conclusion after implementing your suggested edit above (checking regions instead of symbols).
Is it possible to have two string length symbols that are associated to the same region? Could that cause any problem?
E.g. what will happen below? Will we remove `MR` twice? 
```
char *p = "asdf"
char *q = p + 1;
strlen(p); strlen(q);
```


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D86445/new/

https://reviews.llvm.org/D86445



More information about the cfe-commits mailing list