[libcxx-commits] [PATCH] D132522: [1b/3][compiler-rt][ASan] API for annotating objects memory

Wed Sep 21 19:02:46 PDT 2022

AdvenamTacet marked 5 inline comments as done.
AdvenamTacet added inline comments.

================
Comment at: compiler-rt/lib/asan/asan_poisoning.cpp:367
+
+  return IsAligned(size, granularity) && IsAligned(address, granularity);
+}
----------------
AdvenamTacet wrote:
> AdvenamTacet wrote:
> > vitalybuka wrote:
> > > vitalybuka wrote:
> > > > AdvenamTacet wrote:
> > > > > vitalybuka wrote:
> > > > > > AdvenamTacet wrote:
> > > > > > > vitalybuka wrote:
> > > > > > > > AdvenamTacet wrote:
> > > > > > > > > vitalybuka wrote:
> > > > > > > > > > Can we instead update __sanitizer_annotate_contiguous_container and friends to support this case
> > > > > > > > > > 
> > > > > > > > > > I guess we just need to load AddressIsPoisoned(beg-1) && AddressIsPoisoned(end) and make sure we do not poisoning them
> > > > > > > > > No, it's not possible. Inside `__sanitizer_annotate_contiguous_container` there is not enough information. `__sanitizer_is_annotable` is necessary if object wants to poison self memory (not memory allocated by that object, but own memory), both on stack and heap.
> > > > > > > > > 
> > > > > > > > > Example when it's necessary (when alignment matters) is for example object in an array.
> > > > > > > > > ```
> > > > > > > > > // SHADOW_GRANULARITY == 8
> > > > > > > > > struct X { char buff[7];} // sizeof(X) == 7
> > > > > > > > > X tab[2];
> > > > > > > > > ```
> > > > > > > > > We cannot poison the first object in tab, because we will have to poison the first byte of the second object.
> > > > > > > > > 
> > > > > > > > > Therefore I believe that it's not possible to include it in `__sanitizer_annotate_contiguous_container`.
> > > > > > > > address_p and size_v are for container?
> > > > > > > > 
> > > > > > > > this info is provided into __sanitizer_annotate_contiguous_container
> > > > > > > > Simples way is just to have 
> > > > > > > > 
> > > > > > > > ```
> > > > > > > > static bool IsAnotateble();
> > > > > > > > 
> > > > > > > > void __sanitizer_annotate_contiguous_container() {
> > > > > > > >   // instead of CHECK
> > > > > > > >    if (!IsAnotateble) return;
> > > > > > > > }
> > > > > > > > ```
> > > > > > > > 
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > No, it's not possible. Inside `__sanitizer_annotate_contiguous_container` there is not enough information. `__sanitizer_is_annotable` is necessary if object wants to poison self memory (not memory allocated by that object, but own memory), both on stack and heap.
> > > > > > > > > 
> > > > > > > > > Example when it's necessary (when alignment matters) is for example object in an array.
> > > > > > > > > ```
> > > > > > > > > // SHADOW_GRANULARITY == 8
> > > > > > > > > struct X { char buff[7];} // sizeof(X) == 7
> > > > > > > > > X tab[2];
> > > > > > > > > ```
> > > > > > > > > We cannot poison the first object in tab, because we will have to poison the first byte of the second object.
> > > > > > > > > 
> > > > > > > > > Therefore I believe that it's not possible to include it in `__sanitizer_annotate_contiguous_container`.
> > > > > > > > 
> > > > > > > > 
> > > > > > > I believe, data provided into `__sanitizer_annotate_contiguous_container` is not sufficient, as there we do not know if memory is a separated buffer for containers content or part of a bigger object.
> > > > > > > 
> > > > > > > Let's look at it closer. For that example, `SHADOW_GRANULARITY == 8`.
> > > > > > > Here are `__sanitizer_annotate_contiguous_container` arguments and the function do not have any other source of information:
> > > > > > > ```
> > > > > > > void __sanitizer_annotate_contiguous_container(const void *beg_p,
> > > > > > >                                                const void *end_p,
> > > > > > >                                                const void *old_mid_p,
> > > > > > >                                                const void *new_mid_p)
> > > > > > > ```
> > > > > > > If we have a vector with allocated buffer of size 126 bytes (42 elements of size 3), distance between enp_p and beg_p is 126 (and 126 % 8 != 0). That's ok and we can poison that memory, but inside `__sanitizer_annotate_contiguous_container` we do not have information if it's allocated buffer or part of a bigger structure.
> > > > > > > 
> > > > > > > If object wants to poison self memory, size of the object has to be aligned. So, we cannot poison short string if `sizeof(str)==126` bytes, we can annotate long string with external buffer of size 126.
> > > > > > > 
> > > > > > > Also, moving that check into `__sanitizer_annotate_contiguous_container` would have a bad performance effect. In std::vector, my patch for std::deque and long string case, there is only compile-time check if a standard allocator is used, and therefore it's not necessary to confirm that `beg_p` is aligned, in some scenarios (like short string) there is no way to check it compile time.
> > > > > > I am not concerned about alignment check cost, we have function call and memory writes already. This function is quite expensive.
> > > > > > In the case of aligned end_p we continue to work as is.
> > > > > > if not aligned, you can read shadow(end_p) and determine what is just after the buffer
> > > > > > this is slightly inefficient
> > > > > > 
> > > > > > If we open to extend API, alternative is an another version of __sanitizer_annotate_contiguous_container
> > > > > > e.g. __sanitizer_annotate_contiguous_container_unaligned() which never poisons unaligned tail/head granules
> > > > > > 
> > > > > > with __sanitizer_is_annotable() and sizeof(str)==126 you loose opportunity to check internal full granules.
> > > > > > address_p and size_v are for container?
> > > > > 
> > > > > It's for the memory controlled by the programmer. In our case it's address of the object (`this`) and the size of the object (`sizeof(*this)`). It may be also used for variables inside the class.
> > > > > 
> > > > > Inside that memory, programmer is responsible to make sure that every variable is accessible.
> > > > > 
> > > > > A good example is a short string with alternate string layout. Metadata byte is at the very end of the object and therefore is always poisoned, we turn off annotations in functions accessing that byte. (Similarly for optimized copy constructors.)
> > > > > We use `__sanitizer_is_annotable` to make sure that we won't poison memory outside of the string object, where access may not be controlled by us.
> > > > > 
> > > > > We can use a function like `sanitizer_annotate_contiguous_container_unaligned(...)` suggested by you, we would just pass the beginning and the end of the object as the beginning and the end of the buffer - it has the same functionality wrapped in one function, I believe. If you prefer that solution, let me know.
> > > > > However, I still believe that function just for that check is a better choice as it provides a bigger flexibility.
> > > > > 
> > > > > Also, if we go with `sanitizer_annotate_contiguous_container_unaligned`, should we create similar functions for double_ended containers, for consistency? That will double number of functions.
> > > > > 
> > > > > > with __sanitizer_is_annotable() and sizeof(str)==126 you loose opportunity to check internal full granules.
> > > > > 
> > > > > Sorry, I'm not sure if I understand. Do you mean that buffer which we want to poison is aligned (size and address) and it's in a bigger object?
> > > > > Isn't it the same with `__sanitizer_is_annotable` and `sanitizer_annotate_contiguous_container_unaligned` (once we just carefully choose address and size, once we carefully choose two addresses)?
> > > > > 
> > > > > > if not aligned, you can read shadow(end_p) and determine what is just after the buffer
> > > > > > this is slightly inefficient
> > > > > 
> > > > > It looks a little bit risky and requires a lot of thinking about edge cases. I'm honestly not sure if I know how to implement it.
> > > > > I am not concerned about alignment check cost, we have function call and memory writes already. This function is quite expensive.
> > > > > In the case of aligned end_p we continue to work as is.
> > > > > if not aligned, you can read shadow(end_p) and determine what is just after the buffer
> > > > > this is slightly inefficient
> > > > > 
> > > > > If we open to extend API, alternative is an another version of __sanitizer_annotate_contiguous_container
> > > > > e.g. __sanitizer_annotate_contiguous_container_unaligned() which never poisons unaligned tail/head granules
> > > > > 
> > > > > with __sanitizer_is_annotable() and sizeof(str)==126 you loose opportunity to check internal full granules.
> > > > 
> > > > 
> > > Would prefer to do improve existing sanitizer_annotate_contiguous_container
> > > note that you need to load unaligned shadow tail/head only when we about to poison tail/head granule.
> > > so I am not concerned with perf.
> > > Would prefer to do improve existing sanitizer_annotate_contiguous_container
> > 
> > I don't know how to do it without a new function in API.
> > I see the option I suggested or function like:
> > ```
> > sanitizer_annotate_contiguous_container_unaligned(...) {
> >   if(!isAnnotable(...)) return;
> > 
> >   sanitizer_annotate_contiguous_container(...);
> > }
> > ```
> > 
> > Possibly checking shadow memory around the buffer may be another solution, but I'm not sure how to implement it.
> > I'm quite sure that `beg_p` has to be aligned. If `end_p` is not aligned, possibly we can solve it with some ifs, but I'm not sure how exactly and what edge cases may happen.
> > If you have a clear idea, I will appreciate sharing details or implementation.
> > 
> > > so I am not concerned with perf.
> > 
> > Ok, I'm not referring to performance in my previous answer.
> I was thinking about it and I think I have a solution:
> - `beg_p` has to be aligned (otherwise container is not annotated at all),
> - if `end_p` is aligned, container is normally annotated,
> - if `end_p` is not aligned, last granule is annotated if and only if all accessible bytes inside that granule are inside the container.
> 
> I believe it should work.
> 
> But I'm a little bit scared that there is some case I didn't think of. (Is it possible that some implementation break?)
> Let me know what you think about it. If you believe that solution is correct and you would accept it, I can implement it instead of a new function in API.
> 
> Future improvement (I suggest another patch for it):
> - First granule is not annotated when `beg_p` is not aligned, but all middle granules are. Therefore container may be partially annotated, even when `beg_p` is not aligned. 
It seems that it works, I implemented it and updated the revision.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132522/new/

https://reviews.llvm.org/D132522