[llvm] [HWASAN] Use sign extension in memToShadow() and untagPointer() (PR #103727)

Tue Aug 20 12:01:07 PDT 2024

SiFiveHolland wrote:

> > I'm curious if there's any benchmarks or intuition on the performance impact for AArch64?

I don't have any AArch64 benchmark numbers at the moment, but I can run some tests.

> > The commit message mentions that two shifts can be folded together in the backend, but the new LLVM IR is longer (e.g., llvm/test/Instrumentation/HWAddressSanitizer/RISCV/alloca-with-calls.ll replaces an 'and' with 'shl' and 'ashr'), so it sounds like the overall AArch64 instruction count is unchanged; it's not clear to me whether the old or new AArch64 instructions will run faster.

The instruction count is unchanged when inserting tags into pointers (e.g. for stack tagging), but the instruction count for `HWAddressSanitizer::insertShadowTagCheck` is reduced by one because the two right shifts can be combined as well. See for example `llvm/test/Instrumentation/HWAddressSanitizer/basic.ll`, where the new sequence combines to a single `sbfx` instruction:

```diff
-; FASTPATH-NEXT:    [[TMP3:%.*]] = and i64 [[TMP0]], 72057594037927935
-; FASTPATH-NEXT:    [[TMP4:%.*]] = lshr i64 [[TMP3]], 4
+; FASTPATH-NEXT:    [[TMP3:%.*]] = shl i64 [[TMP0]], 8
+; FASTPATH-NEXT:    [[TMP8:%.*]] = ashr i64 [[TMP3]], 8
+; FASTPATH-NEXT:    [[TMP4:%.*]] = ashr i64 [[TMP8]], 4
```

In my [testing with Linux kernel HWASAN](https://lore.kernel.org/linux-arm-kernel/20240814085618.968833-2-samuel.holland@sifive.com/), this resulted in a 4.6% reduction in code size.

The same applies for RISC-V, which doesn't have an equivalent to `sbfx`, but can still combine the two right shifts. And loading the mask would require at least two instructions on RISC-V.

On x86 the code size impact is more of a wash: avoiding the mask saves a 9-byte `movabs` instruction per function, but the shl+ashr instruction sequence is slightly larger than the and+lshr sequence.

> Probably less one register?
> 
> I don't expect a measureable difference, but it looks nicer to me.

Yes, this is also a benefit, though I don't know how much impact it has. I'll try to get some performance numbers from AArch64 hardware.

https://github.com/llvm/llvm-project/pull/103727