[libcxx-commits] [libcxx] [ASan][libc++] Turn on ASan annotations for short strings (PR #79536)

Nico Weber via libcxx-commits libcxx-commits at lists.llvm.org
Fri Jul 12 06:41:29 PDT 2024


nico wrote:

I'm now done debugging all issues encountered after merging this to Chromium. Here's some user feedback, do with it what you will.

I can't help but think that it might make sense to make this opt-in by default, in light of the below.

There were four distinct issues.

1. It found a bug! In https://crbug.com/346356630, we had some test code using a short test string, and the code under test did an out-of-bounds read, and with this change, the test + asan found it. That's great!

2. As mentioned above, the blocks runtime memcpy()s variables captured by blocks using memcpy() and then calls some function to rehydrate them. This is in a system library, we can't do anything about it, and it wasn't super easy to debug (#96099; upstream https://crbug.com/347379886).

3. This patch is incompatible with compiling some of your code with asan enabled and some of it with asan disabled. For our fuzzers, we've been building some code that isn't "under fuzz" with asan disabled for performance reasons (since 5 years ago, so for a while). With this change, that no longer works. In short, the `__annotate_new()` / `__annotate_delete` (functions) etc have a preprocessor check for if asan enabled, and only call the asan runtime if it's enabled. If some targets that depend on libc++ build without asan, they will have definitions of these methods that don't call asan. Other targets with asan enabled have definitions that do call the runtime. So that's an ODR violation, and it depends on luck which definition the linker picks up. So you can end up in situations where one TU creates a string object that has poisoned short string storage, and then it passes it to a different TU that appends data to that string, trying to convert it to long form, but this TU isn't unpoisoning the short string storage area, and trying to write the long size field then traps. (Arguably, it already doesn't work for the vector and long string asan allocations either, but we didn't hit it in practice there.) https://issues.chromium.org/issues/347026228#comment25 has a detailed writeup.

4. We had some user code that memcpy()d keys of a custom hash table type. While I don't understand this issue completely, I'm told that that code doesn't care about copying some uninitialized inline storage bytes. The fix here was to replace a std::string in a hash key with an std::optional<std::string> in asan builds (but not elsewhere, for perf reasons). It took a bunch of time to diagnose, the asan diagnostic was a bit hard to read, and in the end we had to come up with a somewhat boutique workaround (https://crbug.com/346174906). (This one is less interesting than the previous two.)

It was _extremely_ helpful that there's a toggle for turning off asan annotations just for std::string. This allowed us to roll libc++ and just turn off this feature and then finish diagnosing issues asynchronously.

https://github.com/llvm/llvm-project/pull/79536


More information about the libcxx-commits mailing list