[libcxx-commits] [PATCH] D132769: [2b/3][ASan][libcxx] std::basic_string annotations

Nikolas Klauser via Phabricator via libcxx-commits libcxx-commits at lists.llvm.org
Fri Sep 9 09:31:43 PDT 2022


philnik requested changes to this revision.
philnik added a comment.
This revision now requires changes to proceed.

Thanks for working on this! I haven't looked at any of the tests yet.



================
Comment at: libcxx/include/string:603
+#ifndef _LIBCPP_HAS_NO_ASAN
+#  define _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS __attribute__((no_sanitize("address")))
+#else
----------------



================
Comment at: libcxx/include/string:1596
 
-    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20
-    size_type __get_short_size() const _NOEXCEPT {
-        _LIBCPP_ASSERT(!__r_.first().__s.__is_long_, "String has to be short when trying to get the short size");
-        return __r_.first().__s.__size_;
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS size_type
+    __get_short_size() const _NOEXCEPT {
----------------
I don't think this should be marked `_LIBCPP_INTERNAL_MEMORY_ACCESS`. Or is the size/is_long byte also poisoned for some reason?


================
Comment at: libcxx/include/string:1644-1655
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS void
+    __raw_copy_r(basic_string& __dst, const basic_string& __src) _NOEXCEPT {
+      __dst.__r_.first().__r = __src.__r_.first().__r;
+    }
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS void
+    __raw_copy_l(basic_string& __dst, const basic_string& __src) _NOEXCEPT {
+      __dst.__r_.first().__l = __src.__r_.first().__l;
----------------
Two of these are redundant. I've uploaded D132951 to remove the differences. Also, would it be possible to annotate the special member functions of `__rep` instead? I think adding `_LIBCPP_HIDE_FROM_ABI _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS __rep(const __rep&) = default;` should do the trick.


================
Comment at: libcxx/include/string:1657-1666
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 _LIBCPP_STRING_INTERNAL_MEMORY_ACCESS void __zero() _NOEXCEPT {
+      // ASan annotations:
+      // __zero() may be called when object fields contain random data,
+      // therefore we force caller to handle whole object memory annotations,
+      // as only caller knows if object is already initialized and
+      // __annotate_new() can be safely called.
+      //
----------------
This is now unnecessary.


================
Comment at: libcxx/include/string:1670-1672
+    // The following functions are no-ops outside of AddressSanitizer mode.
+    // We call annotatations only for the default Allocator because other allocators
+    // may not meet the AddressSanitizer alignment constraints.
----------------
Couldn't we check whether the allocation satisfies the asan requirements? It looks like the overhead of checking pointer alignment seems quite manageable: https://godbolt.org/z/WeE7Wxx73.


================
Comment at: libcxx/include/string:1673
+    // may not meet the AddressSanitizer alignment constraints.
+    // See the documentation for __sanitizer_annotate_contiguous_container for more details.
+    //
----------------
Could you mention `common_interface_defs.h` here instead? I couldn't find any documentation generated by the comments in that header.


================
Comment at: libcxx/include/string:1678
+    // as object itself could be allocated with different allocator than used inside the object.
+#ifdef TEST_HAS_FEATURE
+#  if TEST_HAS_FEATURE(address_sanitizer) && _LIBCPP_CLANG_VER >= 16000
----------------
You can't check for non-_Uglified defines inside the headers. You should instead use `_LIBCPP_HAS_NO_ASAN`. You also want this to work outside our test environment, right?


================
Comment at: libcxx/include/string:1681
+    _LIBCPP_HIDE_FROM_ABI _LIBCPP_CONSTEXPR_SINCE_CXX20 void __annotate_contiguous_container(
+        const void* __beg, const void* __end, const void* __old_mid, const void* __new_mid) const {
+      if (!__libcpp_is_constant_evaluated())
----------------
The two characters really don't hurt anybody and makes it a lot nicer to read.


================
Comment at: libcxx/include/string:1682-1683
+        const void* __beg, const void* __end, const void* __old_mid, const void* __new_mid) const {
+      if (!__libcpp_is_constant_evaluated())
+        if (__beg && is_same<allocator_type, __default_allocator_type>::value)
+          __sanitizer_annotate_contiguous_container(__beg, __end, __old_mid, __new_mid);
----------------
Could you make this a single `if`?


================
Comment at: libcxx/include/string:1692-1695
+      if (!__libcpp_is_constant_evaluated())
+        return __sanitizer_is_annotable(this, sizeof(*this));
+      else
+        return false;
----------------



================
Comment at: libcxx/include/string:1714-1718
+          __annotate_contiguous_container(
+              std::__to_address(__get_long_pointer()),
+              std::__to_address(__get_long_pointer() + __get_long_cap() + 1),
+              std::__to_address(__get_long_pointer() + __get_long_cap() + 1),
+              std::__to_address(__get_long_pointer() + __current_size + 1));
----------------
Could we use a common interface for this and the vector-equivalent? It would be nice if we could add a `__debug_utils/address_sanitizer.h` where we define `__annotate_contiguous_container()` and some CRTP empty base that defines the functions used here. AFAICT it should be possible to define all of these functions for the long string with the public API of both `vector` and `string`. Then we could simply add a function here with the same name to special-case the short string. I hope this would simplify the implementation here quite a bit. I'm not sure though, so tell me if there is some problem.


================
Comment at: libcxx/include/string:1777-1780
+      if (__is_long())
+        __annotate_delete<false>();
+      else
+        __annotate_delete<true>();
----------------
Why do you do this special-casing here?


================
Comment at: libcxx/include/string:2120-2125
+    // __init may be called after setting shadow memory, then we have to
+    // unpoison object memory.
+    // Chosen solution is to force caller to unpoison memory before call,
+    // as caller knows if memory is already poisoned or not.
+    // It is different than in __zero/__default_init, as there we turned off instrumentation,
+    // but there also caller has to take care of annotations.
----------------
When is `__init` called after poisoning the memory? `__init` should only ever be called by a constructor.


================
Comment at: libcxx/test/std/strings/basic.string/string.asan/replace.pass.cpp:318
+}
\ No newline at end of file

----------------
You're missing a few newlines.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D132769/new/

https://reviews.llvm.org/D132769



More information about the libcxx-commits mailing list