[clang] [llvm] [ASAN] For Asan instrumented global, emit two symbols, one with actual size and other with instrumented size. (PR #70166)

Fri Feb 9 09:04:39 PST 2024

hctim wrote:

Hi,

My apologies for the delay.

I started to hack together the same idea using `SanitizerMetadata`, which is definitely the preferred way of adding sanitizer globalvariable instrumentation (given that everything else already lives there). You can see the WIP here: https://pastebin.com/vXzHmnMg

Part way through patching up all the tests though, I've come to the conclusion that this really isn't a great idea. For just ASan, sure we can just emit a secondary symbol and everything is hunky dory. But I think that we shouldn't do something like this for just ASan, it needs to also work for HWASan and MTE-globals as well. And it really doesn't compose well with the MTE-globals model. If you were to emit the unpadded symbol as the dominant name (from the c/cpp definition), then the linker gets super confused if you're linking multiple DSOs, as it needs to ensure that *all* of the DSOs use the tagged definition (or they all get demoted to being untagged). So, you'd need the padded one to be the dominant name, which I think is not what you want.

This feels like a behaviour that we don't want the additional complexity for AMDGPU's drivers, because I highly suspect you can just deal with the sanitizers in the AMDGPU driver itself.

How does your driver work? If it's basically doing a `dlsym` -> `memcpy` (and the memcpy interceptor in ASan is firing because you're going OOB on the unpadded symbol), can't you just have an `internal_memcpy` that's built with `__attribute__((no_sanitize("address")))`?

We'd also potentially like to put some data in the padding. This has been discussed between some colleagues and I. If you end up not copying the padding as well, you might drop some metadata and things could explode in future.

HTH,
Mitch.

https://github.com/llvm/llvm-project/pull/70166