[libcxx-commits] [libcxx] [ASan][libc++] Turn on ASan annotations for short strings (PR #79536)

Thu May 9 06:16:06 PDT 2024

================
@@ -736,10 +735,44 @@ public:
   //
   // This string implementation doesn't contain any references into itself. It only contains a bit that says whether
   // it is in small or large string mode, so the entire structure is trivially relocatable if its members are.
+#if !defined(_LIBCPP_HAS_NO_ASAN) && defined(_LIBCPP_INSTRUMENTED_WITH_ASAN)
+  // When compiling with AddressSanitizer (ASan), basic_string cannot be trivially
+  // relocatable. Because the object's memory might be poisoned when its content
+  // is kept inside objects memory (short string optimization), instead of in allocated
+  // external memory. In such cases, the destructor is responsible for unpoisoning
+  // the memory to avoid triggering false positives.
+  // Therefore it's crucial to ensure the destructor is called
+  using __trivially_relocatable = false_type;
----------------
AdvenamTacet wrote:

I'm happy to incrementally improve string annotations.
Without change of `__trivially_relocatable` and no other changes ASan error happens exactly here:

https://github.com/llvm/llvm-project/blob/11a6799740f824282650aa9ec249b55dcf1a8aae/libcxx/include/__memory/uninitialized_algorithms.h#L644-L646

`__builtin_memcpy` accesses poisoned memory, when objects memory is poisoned.

We can unpoison memory there for every type. Then problematic code area would be:
```cpp
 } else {
 #ifndef  _LIBCPP_HAS_NO_ASAN
   __asan_unpoison_memory_region(__first, sizeof(_Tp) * (__last - __first));
#endif
   __builtin_memcpy(const_cast<__remove_const_t<_Tp>*>(__result), __first, sizeof(_Tp) * (__last - __first)); 
 } 
 ```
Disadvantage of that over changing the value of `__trivially_relocatable` is the need of unpoisoning memory like that in every function which depends on `__libcpp_is_trivially_relocatable`. But it shouldn't be a big problem.

What I don't like more is fact that objects after that move won't be poisoned correctly (may result in false negatives).
My idea to fix it is creating a compiler-rt function `__asan_move_annotations` which may be used here instead of `__asan_unpoison_memory_region`, making old buffer unpoisoned and the new buffer poisoned in the same way as the old one.

If we try to optimize it as much as possible, creating something similar to `__libcpp_is_trivially_relocatable` just for ASan is possible. For example `__libcpp_may_memory_be_poisoned`, and calling `__asan_unpoison_memory_region/__asan_move_annotations` if and only if that value is true. However, `__asan_unpoison_memory_region` should be cheaper than `__builtin_memcpy`, so I think we can accept additional cost here as new ASan trait adds a lot of complexity.

> We're destroying objects after all, which should probably poison memory.

Poisoning, if happens, happens in the allocator (during deallocation) and we cannot assume anything about it. For example, allocator cleaning memory before freeing it touches the whole memory before annotations are updated. Therefore `__annotate_delete()` in all classes is necessary.

But something like `__asan_move_annotations` is possible and I can create it.  Then we can go back to one value of `__trivially_relocatable`.
To put my actions behind my words, I will open a PR with that change Today or Tomorrow.

https://github.com/llvm/llvm-project/pull/79536