[PATCH] D76996: [analyzer] Improve PlacementNewChecker

Mon May 11 12:23:13 PDT 2020

f00kat added a comment.

In D76996#2017572 <https://reviews.llvm.org/D76996#2017572>, @martong wrote:

> > ... This draws a pattern that we should recursively descend down to the top most base region. I.e. the different check*RegionAlign methods should call into each other until we reach the top level base region.
>
>
>
> > The observation here is that the alignment of a region can be correct only if we can prove that its base region is aligned properly (and other requirements, e.g. the offset is divisible). But the base region may have another base region and we have to prove the alignment correctness to that as well.
>
> This could be an issue not just with alignment but maybe with the size as well, I am not sure if we handle the offset properly in compound cases like this: `Xi.b[0].a[1][6]`.
>
> Even though the above issue is still not investigated/handled, I think this patch is now acceptable because seems like most of the practical cases are handled. We could further investigate the concern and improve in a follow-up patch.
>  I'd like to see this landed and thanks for your work!

Thanks for feedback!

I still have no rights to push in the repo so if you think that it is acceptable could you commit it please?

================
Comment at: clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp:189
+  if (const MemRegion *MRegion = PlaceVal.getAsRegion()) {
+    if (const ElementRegion *TheElementRegion =
+            MRegion->getAs<ElementRegion>()) {
----------------
NoQ wrote:
> martong wrote:
> > f00kat wrote:
> > > f00kat wrote:
> > > > NoQ wrote:
> > > > > NoQ wrote:
> > > > > > The sequence of `FieldRegion`s and `ElementRegion`s on top of a base region may be arbitrary: `var.a[0].b[1][2].c.d[3]` etc.
> > > > > > 
> > > > > > I'd rather unwrap those regions one-by-one in a loop and look at the alignment of each layer.
> > > > > Alternatively, just decompose the whole region into base region and offset and see if base region has the necessary alignment and the offset is divisible by the necessary alignment.
> > > > > The sequence of FieldRegions and ElementRegions on top of a base region may be arbitrary: var.a[0].b[1][2].c.d[3] etc.
> > > > 
> > > > But i think(hope) I already do this and even have tests for this cases. For example
> > > > ```void test22() {
> > > >   struct alignas(alignof(short)) Z {
> > > >     char p;
> > > >     char c[10];
> > > >   };
> > > > 
> > > >   struct Y {
> > > >     char p;
> > > >     Z b[10];
> > > >   };
> > > > 
> > > >   struct X {
> > > >     Y a[10];
> > > >   } Xi; // expected-note {{'Xi' initialized here}}
> > > > 
> > > >   // ok. 2(X align) + 1 (offset Y.p) + 1(align Z.p to 'short') + 1(offset Z.p) + 3(index)
> > > >   ::new (&Xi.a[0].b[0].c[3]) long;
> > > > }```
> > > > 
> > > > Cases with multidimensional arrays will also be handled correctly because method 'TheElementRegion->getAsArrayOffset()' calculates the offset for multidimensional arrays
> > > > ```void testXX() {
> > > > 	struct Y {
> > > > 		char p;
> > > > 		char b[10][10];
> > > > 	};
> > > > 
> > > > 	struct X {
> > > > 		Y a[10];
> > > > 	} Xi;
> > > > 
> > > > 	::new (&Xi.a[0].b[0][0]) long;
> > > > }```
> > > > 
> > > > I can explain the code below for ElementRegion if needed.
> > > ?
> > Yeah, the tests are convincing and I think that you are handling the regions well.
> > 
> > On the other hand, this code is getting really complex, we should make it easier to read and understand.
> > E.g. `FieldOffsetValue` should be explained more, is it the offset started from the the start address of the multidimensional array, or it is just the offset from one element's start address?
> > Also, you have two variables named as `Offset`. They are offsets from which starting address? Perhaps we should have in the comments a running example, maybe for `&Xi.a[0].b[0][0]`? I mean is `FieldOffseValue` is standing for `b` or for `a`?
> > Alternatively, just decompose the whole region into base region and offset and see if base region has the necessary alignment and the offset is divisible by the necessary alignment.
> 
> I expect this to be, like, 5 lines of code. I don't understand why the current code is so complicated, it looks like you're considering multiple cases but ultimately doing the same thing.
Thank you for feedback!
Rewroted the code for ElementRegion cases. Now it is much easier to understand, and there was also a bug in logic.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp:261
+
+  if (const VarRegion *TheVarRegion = BaseRegion->getAs<VarRegion>()) {
+    const VarDecl *TheVarDecl = TheVarRegion->getDecl();
----------------
martong wrote:
> martong wrote:
> > Perhaps you could call instead `checkVarRegionAlign()`?
> Also, I think `BaseRegion` can be an `ElementRegion` here as well. So, in that case we should call into `checkElementRegionAlign()`, shouldn't we? This draws a pattern that we should recursively descend down to the top most base region. I.e. the different `check*RegionAlign` methods should call into each other until we reach the top level base region.
> 
> The observation here is that the alignment of a region can be correct only if we can prove that its base region is aligned properly (and other requirements, e.g. the offset is divisible). But the base region may have another base region and we have to prove the alignment correctness to that as well.
> 
> I hope this makes sense, please correct me if I am wrong.
> Perhaps you could call instead checkVarRegionAlign()?
Yeah, you are right. Thank you!

> Also, I think BaseRegion can be an ElementRegion here as well. So, in that case we should call into checkElementRegionAlign(), shouldn't we? This draws a pattern that we should recursively descend down to the top most base region. I.e. the different check*RegionAlign methods should call into each other until we reach the top level base region.

I`m not sure that BaseRegion can be ElementRegion. Anyway there always must be some Variable in the end? Or maybe I am wrong?

> The observation here is that the alignment of a region can be correct only if we can prove that its base region is aligned properly (and other requirements, e.g. the offset is divisible). But the base region may have another base region and we have to prove the alignment correctness to that as well.
> 
> I hope this makes sense, please correct me if I am wrong.

I split check into to two stages.
1. Check BaseRegion align. 
But if Var has its own align specifier we ignore BaseRegion align.
2. Check that total offset is divisible by the necessary alignment. 
When I say total offset I mean that it is calculated from the BaseRegion(through all Fields and Elements Regions. e.g. Xi.a[10].b[20].c[30]. Total offset of 'c[30]' is offset from &Xi).

So in this solution we no need to recursively check all regions align.

================
Comment at: clang/lib/StaticAnalyzer/Checkers/CheckPlacementNew.cpp:125
+        Msg = std::string(llvm::formatv(
+            "Possibly not enough {0} bytes for array allocation which "
+            "requires "
----------------
NoQ wrote:
> f00kat wrote:
> > martong wrote:
> > > Maybe the below could be a wording that's more easy to follow?
> > > `{0} bytes is possibly not enough for array allocation which ...`
> > Yeah. Sure. I have some troubles with english :)
> Why do we say "possibly"? Where does the uncertainty come from?
I don’t know what specific size Clang uses for its internal needs in array cases. If you can tell this size, I will use it here.

================
Comment at: clang/test/Analysis/placement-new.cpp:245
+
+  // ok. 2(short align) + 3*2(index '3' offset)
+  ::new (&b[3]) long;
----------------
martong wrote:
> So these tests failed after you rewrote ElementRegion processing, right?
> Actually, I wonder why we thought that if x is divisible by 2 then (x+6) will be divisible by 8 unconditionally.It's good you have this fixed.
Yes, It is my fault. For example variable can be allocated at address 0x149E730 and it is well aligned to 2,4,8. And the assert that '0+6' is well 
 aligned to '8' will be wrong.

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D76996/new/

https://reviews.llvm.org/D76996