[PATCH] D125195: [asan][ARMCXXABI] Added missing asan poison array cookie hooks.
Roy Sundahl via Phabricator via cfe-commits
cfe-commits at lists.llvm.org
Mon May 9 13:25:52 PDT 2022
rsundahl added a comment.
Adding dialog to comments made by reviewers.
================
Comment at: clang/lib/CodeGen/ItaniumCXXABI.cpp:2443
+ // Handle poisoning the array cookie in asan
+ if (CGM.getLangOpts().Sanitize.has(SanitizerKind::Address) && AS == 0 &&
+ (expr->getOperatorNew()->isReplaceableGlobalAllocationFunction() ||
----------------
delcypher wrote:
> Why is there a restriction on the address space?
It's only there because it's also there in ItaniumCXXABI::InitializeArrayCookie() method and I have no reason to think that the ARMCXXABI version would be different. That said, this is the revision that introduced the notion originally: https://reviews.llvm.org/D5111
================
Comment at: clang/lib/CodeGen/ItaniumCXXABI.cpp:2478
+ // run-time deal with it: if the shadow is properly poisoned return the
+ // cookie, otherwise return 0 to avoid an infinite loop calling DTORs.
+ // We can't simply ignore this load using nosanitize metadata because
----------------
delcypher wrote:
> This comment is confusing. What's returning `0`? `__asan_load_cxx_array_cookie` doesn't do that and AFAICT neither does this code.
(Same answer) It's only there because it's also there in ItaniumCXXABI::InitializeArrayCookie() method and I have no reason to think that the ARMCXXABI version would be different. That said, this is the revision that introduced the notion originally: https://reviews.llvm.org/D5111
================
Comment at: clang/lib/CodeGen/ItaniumCXXABI.cpp:2479
+ // cookie, otherwise return 0 to avoid an infinite loop calling DTORs.
+ // We can't simply ignore this load using nosanitize metadata because
+ // the metadata may be lost.
----------------
delcypher wrote:
> I also don't understand what you mean by the comment. Could you elaborate?
(Same answer) It's only there because it's also there in ItaniumCXXABI::InitializeArrayCookie() method and I have no reason to think that the ARMCXXABI version would be different. That said, this is the revision that introduced the notion originally: https://reviews.llvm.org/D5111
================
Comment at: clang/lib/CodeGen/ItaniumCXXABI.cpp:2485
+ CGM.CreateRuntimeFunction(FTy, "__asan_load_cxx_array_cookie");
+ return CGF.Builder.CreateCall(F, numElementsPtr.getRawPointer(CGF));
}
----------------
yln wrote:
> Same here: extract common helper for the ASan stuff to be shared by ARM and ItaniumCXXABI.
Was limiting my impact but yes, I agree with you and thanks for expecting it. FYI and FWIW: the same concern that @delcypher has in the preceding comments will still exist because the factored code will come from the existing Itanium methods and those concerns come from that existing code that was introduced by: https://reviews.llvm.org/D5111.
================
Comment at: compiler-rt/lib/asan/asan_poisoning.cpp:262
*reinterpret_cast<u8*>(s) = kAsanArrayCookieMagic;
+ // The ARM64 cookie has a second "elementSize" entry so poison it as well
+ #if SANITIZER_ARM64
----------------
yln wrote:
> yln wrote:
> > yln wrote:
> > > Nitpicking extreme:
> > > * I think the code inside `#if` shouldn't have extra indent (because preprocessor sections don't establish code). If in doubt, let `clang-format` do it for you.
> > > * Let's move the comment inside the #if, just above before the line of code. If you ever read the pre-processed source-code, then the comment "lives and dies" with the line of code it relates too, i.e, on x86, currently there would be a comment without the code.
> >
> I find this a bit confusing
> * x86_64: cookie is 1 word and passed `p` points to it
> * arm64: cookie is 2 words and passed `p` points to second half of it
>
> Would it be worth to take the extra care in CodeGen to always pass the "beginning of the cookie" to `__asan_poison_cxx_array_cookie()` and then have something like that:
> ```
> size_t shadow_cookie_size = SANITIZER_ARM64 ? 2 : 1:
> internal_memset(s, kAsanArrayCookieMagic, shadow_cookie_size);
> ```
I don't think so for a collection of reasons. When the ItaniumCXXABI defines the "array cookie" it states: "The cookie will have size sizeof(size_t)" and it then describes how there may be padding preceding the cookie but makes no mention of a second element anywhere. (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#array-cookies). Now, while the ARMCXXABI(32) specification defines the cookie as two 4-byte (int) values (sizeof(element),numElements), the ARMCXXABI(64) (https://github.com/ARM-software/abi-aa/blob/320a56971fdcba282b7001cf4b84abb4fd993131/cppabi64/cppabi64.rst#the-generic-c-abi) does no such similar thing and defers to the Itanium specification of a single, type size_t (8 bytes) cookie which contains only (numElements). The initial commit creating the ARMCXXABI(64) mimicking the ARMCXXABI(32) "pair" of values occurred here: https://marc.info/?l=cfe-commits&m=135915714232578&w=2 and is then (unfortunately) this interpretation was reiterated here: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms
which (imo incorrectly) states:
"The ABI provides a fixed layout of two size_t words for array cookies, with no extra alignment requirements. This behavior matches the ARM 32-bit C++ ABI." However... there is no supporting documentation for this two-element cookie in the ARMCXXABI(64). Furthermore, throughout the llvm source code, including sanitizer, the term "array cookie" ALWAYS means the (size_t) bytes immediately preceding the array pointer that is returned by the allocator. For these reasons, I have chosen to treat the "array cookie" uniformly as a size_t singleton and the additional sizeof(element) as the padding that it is, but with the additional information. (There is no use in llvm of the second element either in the Itanium nor the ARM ABIs including the 32 bit ARM ABI. I'm confident that we should support only the defined array cookie as a single value but also fill the padding with sizeof(element) and deprecate the whole idea eventually.)
================
Comment at: compiler-rt/lib/asan/asan_poisoning.cpp:262-265
+ // The ARM64 cookie has a second "elementSize" entry so poison it as well
+ #if SANITIZER_ARM64
+ *(reinterpret_cast<u8*>(s)-1) = kAsanArrayCookieMagic;
+ #endif
----------------
rsundahl wrote:
> yln wrote:
> > yln wrote:
> > > yln wrote:
> > > > Nitpicking extreme:
> > > > * I think the code inside `#if` shouldn't have extra indent (because preprocessor sections don't establish code). If in doubt, let `clang-format` do it for you.
> > > > * Let's move the comment inside the #if, just above before the line of code. If you ever read the pre-processed source-code, then the comment "lives and dies" with the line of code it relates too, i.e, on x86, currently there would be a comment without the code.
> > >
> > I find this a bit confusing
> > * x86_64: cookie is 1 word and passed `p` points to it
> > * arm64: cookie is 2 words and passed `p` points to second half of it
> >
> > Would it be worth to take the extra care in CodeGen to always pass the "beginning of the cookie" to `__asan_poison_cxx_array_cookie()` and then have something like that:
> > ```
> > size_t shadow_cookie_size = SANITIZER_ARM64 ? 2 : 1:
> > internal_memset(s, kAsanArrayCookieMagic, shadow_cookie_size);
> > ```
> I don't think so for a collection of reasons. When the ItaniumCXXABI defines the "array cookie" it states: "The cookie will have size sizeof(size_t)" and it then describes how there may be padding preceding the cookie but makes no mention of a second element anywhere. (https://itanium-cxx-abi.github.io/cxx-abi/abi.html#array-cookies). Now, while the ARMCXXABI(32) specification defines the cookie as two 4-byte (int) values (sizeof(element),numElements), the ARMCXXABI(64) (https://github.com/ARM-software/abi-aa/blob/320a56971fdcba282b7001cf4b84abb4fd993131/cppabi64/cppabi64.rst#the-generic-c-abi) does no such similar thing and defers to the Itanium specification of a single, type size_t (8 bytes) cookie which contains only (numElements). The initial commit creating the ARMCXXABI(64) mimicking the ARMCXXABI(32) "pair" of values occurred here: https://marc.info/?l=cfe-commits&m=135915714232578&w=2 and is then (unfortunately) this interpretation was reiterated here: https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms
> which (imo incorrectly) states:
> "The ABI provides a fixed layout of two size_t words for array cookies, with no extra alignment requirements. This behavior matches the ARM 32-bit C++ ABI." However... there is no supporting documentation for this two-element cookie in the ARMCXXABI(64). Furthermore, throughout the llvm source code, including sanitizer, the term "array cookie" ALWAYS means the (size_t) bytes immediately preceding the array pointer that is returned by the allocator. For these reasons, I have chosen to treat the "array cookie" uniformly as a size_t singleton and the additional sizeof(element) as the padding that it is, but with the additional information. (There is no use in llvm of the second element either in the Itanium nor the ARM ABIs including the 32 bit ARM ABI. I'm confident that we should support only the defined array cookie as a single value but also fill the padding with sizeof(element) and deprecate the whole idea eventually.)
Good stuff. Please nit-pick all you like. Will update.
================
Comment at: compiler-rt/test/asan/TestCases/Posix/new_array_cookie_test.cpp:3
// RUN: %clangxx_asan -O3 %s -o %t
-// RUN: not %run %t 2>&1 | FileCheck %s
+// RUN: not %run %t 2>&1 | FileCheck %s
// RUN: %env_asan_opts=poison_array_cookie=1 not %run %t 2>&1 | FileCheck %s
----------------
yln wrote:
> ❤ The boy scouts rule:
> > Always leave the campground cleaner than you found it.
I pick up litter on my walks too. ;-)
================
Comment at: compiler-rt/test/asan/TestCases/Posix/new_array_cookie_test.cpp:19
C *buffer = new C[argc];
- buffer[-2].x = 10;
+ buffer[-1].x = 10;
// CHECK: AddressSanitizer: heap-buffer-overflow
----------------
yln wrote:
> Can we try to simulate a 1-byte buffer-underflow (and leave the definition of `struct C` as-is)? This would show that ASan still reports the smallest possible infraction.
That's a good idea, but it's not about leaving the struct the same because it's definitely wrong in that type "int" has nothing to do with the cookie and it only happens to be true that (buffer[-2].x) gets to the beginning of the cookie since two "int"s happen to fit inside of a "size_t". (This may not always be true as in most 64-on-32 type arrangements) It's better to treat the cookie as [-1] of an array of size_t elements so that when indexing [-1] you get all of the cookie at once (see: https://itanium-cxx-abi.github.io/cxx-abi/abi.html#array-cookies). The reason I write to the *whole* cookie at once is because of endian-ness. The existing code which only writes to the first half of the cookie would reach a different half of the cookie on a big-endian machine so this covers both. But you have a good point, we should see if the minimum underflow is caught so I'll think of a way to do that in an endian-agnostic way...
================
Comment at: compiler-rt/test/asan/TestCases/Posix/new_array_cookie_uaf_test.cpp:20
__attribute__((no_sanitize_address)) void Write42ToCookie(C *c) {
- long *p = reinterpret_cast<long*>(c);
+ size_t *p = reinterpret_cast<size_t*>(c);
p[-1] = 42;
----------------
yln wrote:
> Can we try to simulate a 1-byte buffer-underflow (and leave the definition of struct C as-is)? This would show that ASan still reports the smallest possible infraction.
(Same answer as above)
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D125195/new/
https://reviews.llvm.org/D125195
More information about the cfe-commits
mailing list