[PATCH] D68725: [analyzer] MemoryBlockRegion: Generalize AllocaRegion

Csaba Dabis via Phabricator via cfe-commits cfe-commits at lists.llvm.org
Mon Oct 28 14:09:44 PDT 2019


Charusso added a comment.

Thanks for the notes!

In D68725#1722136 <https://reviews.llvm.org/D68725#1722136>, @NoQ wrote:

> Generally, there's no need to add more information to `MemRegion` and `SymExpr` objects [...]


Well, that was my first idea, then I saw we allow helper-methods inside regions. `getExtent()` has been deprecated on my side.

> Even if `AllocaRegion` didn't store the expression, any path-sensitive checker for which such region is "interesting" would have to implement a bug visitor to display the allocation site. Such visitor automatically knows the location of the `alloca()` expression and can emit the necessary fixits.

I would create two type of `MemoryBlockRegions`, one for expressions: `malloc()`, one for declarations: `std::string str`. The rational behind the latter to make `CStringChecker` as simply `StringChecker`, with modelling the concept: `str.size()` is the dynamic size behind `VarRegion(str)`. My idea here to make it similar to `getDynamicTypeInfo` so I can obtain the allocated memory block's capacity, element count, used size easily in every checker. This information could simplify `MallocChecker` a lot, because the allocated size of memory could serve the allocation state. The "used size" is heuristically modeled in `CStringChecker`, so it could be simplified, and my new checker would be interested in the element-count only. Also I am thinking of the `InnerBufferChecker` could connect to a `MemoryBlockRegion` as all the views abstracts a block of memory.

This new model is basically a new middle-layer in the hierarchy, for example: `std::string str` -> `VarRegion(str)` has a memory block behind the scenes, which belongs to the generic heap. This means, I cannot introduce a new type of `MemSpaceRegion`, as a memory block is a subset of memory space. Both the `Decl` and `Expr` based regions produce dynamically changing stuff, that is why I would use the concept of `AllocaRegion` with an `Expr` or a `Decl` of the allocation as a known identifier.

I cannot write that down: `str.size() == getDynamicSize(str)` or `char *cstr; strlen(cstr) == getDynamicSize(cstr)`, where the VarRegion being a pointer to the data has a size of a pointer, but I am interested in the block of memory it points to. So, even I could remove the entire `AllocaRegion`, I think it is great to distinguish between a pointer and a block of memory. Also this entire dynamic-size business would be based on `MemoryBlockRegions`, which I believe a good way to isolate a new concept with creating new types of regions, and not to stress the current model.

I have mixed feelings how would we model a `string`, and connect it with the current model. I think that would be neat to express like that:
`malloc(str.size() + 1)` == `MemoryBlockExprRegion(getDynamicSize(MemoryBlockDeclRegion(VarRegion(str))) + 1)`, I believe.

Here is an example why I am interested in strings. Here is a function from LevelDB which does not inject the null-terminator:
(https://github.com/google/leveldb/blob/master/db/c.cc)

  static char* CopyString(const std::string& str) {
    char* result = reinterpret_cast<char*>(malloc(sizeof(char) * str.size()));
    memcpy(result, str.data(), sizeof(char) * str.size());
    return result;
  }

The proper way to copy a string is:

  static char* CopyString(const std::string& str) {
    char* result = reinterpret_cast<char*>(malloc(str.size() + 1));
    strcpy(result, str.data());
    return result;
  }

Also, this would be the first step to handle everything in the `string.c` test file, which is not the goal. The goal is the size of a memory block.



================
Comment at: clang/test/Analysis/pr22954.c:628
   clang_analyzer_eval(m29[j].s3[k] == 1); // expected-warning{{TRUE}}
+  // expected-warning at -1 {{Potential leak of memory pointed to by field 's4'}}
   clang_analyzer_eval(l29->s1[m] == 2); // expected-warning{{UNKNOWN}}
----------------
NoQ wrote:
> See the other part of the FIXME: `But not on the previous line`! The symbol is not yet dead on line 628; i specifically left this comment around because this test often gets "fixed" incorrectly.
Well, this is not on the previous line, but I will mark this as unresolved then. In my mind with that new concept we see that we have an eternal block of memory, which we needs to explicitly `free()`.


Repository:
  rC Clang

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D68725/new/

https://reviews.llvm.org/D68725





More information about the cfe-commits mailing list