[PATCH] [analyzer][Review request] Better modelling of memcpy by the CStringChecker (PR16731)

Wed Nov 6 13:44:39 PST 2013

  If I understand correctly all the regressions are known limitations.
  The first type of regressions (reflected in the 'escapeSourceContents()' test) is caused by the fact that the real copy is not performed. The metadata (e.g. allocation state) is, hence, also not copied from source to destination and from this point we can not track the allocation state of contents of 'scr'. So we conservatively assume that all the subregions of 'src' escape.
  Consider an example:

   char *p = malloc(12);
   memcpy(s, &p, 12); // no warning

  after this copy we can also manipulate the regions (pointed to by 'p') allocation state using 's' ( free(*s); for example) but currently we can not track this manipulations because information about allocation state is not copied to 's's region.
  The main goal of the patch is just to suppress warning for the example above, because many false-positives were of the same kind.


  The second type of regressions is because we do not precisely analyze the subregions of 'dst'. Ideally we should not change an allocation state of either source or destination or their subregions because 'memcpy' neither allocates nor frees memory.
  But currently we just invalidate the destination (and automatically invalidate all regions accessible through destination) removing all bindings in it. So we conservatively treat all this regions as escaped (currently specially suppressing an escape of the destination region itself). This type of limitations is reflected in the 'invalidateDestinationContents()' test.
  I have an idea how to eliminate several issues but I should think it over and planned to handle this after the patch gets in.


  Your test is a leak and we want to report it. Here are several clarifying examples from the 'new/delete checker LLVM false positives' correspondence written by Jordan:

  > Hm. This is still not right...in particular, this test case should still be a leak:
  >
  > void test(char *s) {
  >   char *p = malloc(4);
  >   memcpy(s, p, 4);
  > }
  >
  > Why? Because here, the address of 'p' doesn't escape—only its contents. That's the point of using const-invalidation: the top-level region does not escape, but the indirect ones do.
  >
  > void test(char **s) {
  >   char *p = malloc(4);
  >   memcpy(s, &p, sizeof(p));
  > }  // no-warning
  >
  > And this shouldn't escape either, because (again) only the contents of 'p' are invalidated, not the address.
  >
  > void test(char *s) {
  >   char *p = malloc(4);
  >   memcpy(p, s, 4);
  > } // leak!

  Jordan, if you have a chance to read this, please, correct me if I am wrong.


================
Comment at: test/Analysis/malloc.c:643
@@ -632,2 +642,3 @@
+
 // Rely on the CString checker evaluation of the strcpy API to convey that the result of strcpy is equal to p.
 void symbolLostWithStrcpy(char *s) {
----------------
Anna Zaks wrote:
> Should we add the original example(from PR16731) as well?
> char *f() {
>   void *x = malloc(47);
>   char *a;
>   memcpy(&a, &x, sizeof a);
>   return a;
> }
Done!


http://llvm-reviews.chandlerc.com/D1887