[llvm] [HWASAN] Use sign extension in memToShadow() and untagPointer() (PR #103727)
Samuel Holland via llvm-commits
llvm-commits at lists.llvm.org
Wed Aug 28 18:33:45 PDT 2024
SiFiveHolland wrote:
Here's some benchmark numbers:
<details>
<summary>CoreMark results on Cortex-A76 (RK3588)</summary>
Baseline:
```
CoreMark 1.0 : 11116.051578 / Android (12085363, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
CoreMark 1.0 : 11116.051578 / Android (12085363, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
CoreMark 1.0 : 11111.111111 / Android (12085363, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
CoreMark 1.0 : 11125.945705 / Android (12085363, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
CoreMark 1.0 : 11123.470523 / Android (12085363, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
Performance counter stats for './baseline/libc.so ./baseline/coremark.exe' (5 runs):
27,979.84 msec task-clock # 1.000 CPUs utilized ( +- 0.02% )
846 context-switches # 30.231 /sec ( +- 2.61% )
0 cpu-migrations # 0.000 /sec
2,710 page-faults # 96.840 /sec
63,707,649,465 cycles # 2.277 GHz ( +- 0.02% )
211,110,761,288 instructions # 3.31 insn per cycle ( +- 0.00% )
45,538,007,784 branches # 1.627 G/sec ( +- 0.00% )
98,909,714 branch-misses # 0.22% of all branches ( +- 0.28% )
27.98656 +- 0.00512 seconds time elapsed ( +- 0.02% )
```
With this patch:
```
CoreMark 1.0 : 11060.723371 / Android (dev, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
CoreMark 1.0 : 11039.355302 / Android (dev, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
CoreMark 1.0 : 11074.197121 / Android (dev, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
CoreMark 1.0 : 11073.583965 / Android (dev, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
CoreMark 1.0 : 11074.197121 / Android (dev, +pgo, +bolt, +lto, +mlgo, based on r530567) Clang 19.0.0 (https://android.googlesource.com/toolchain/llvm-project 97a699bf4812a18fb657c2779f5296a4ab2694d2) -O2 -DPERFORMANCE_RUN=1 -lrt / Heap
Performance counter stats for './baseline/libc.so ./custom/coremark.exe' (5 runs):
28,102.82 msec task-clock # 0.999 CPUs utilized ( +- 0.06% )
849 context-switches # 30.188 /sec ( +- 0.05% )
0 cpu-migrations # 0.000 /sec
2,708 page-faults # 96.288 /sec
64,007,014,069 cycles # 2.276 GHz ( +- 0.06% )
213,868,614,685 instructions # 3.34 insn per cycle ( +- 0.00% )
45,538,085,229 branches # 1.619 G/sec ( +- 0.00% )
102,999,597 branch-misses # 0.23% of all branches ( +- 1.86% )
28.1261 +- 0.0161 seconds time elapsed ( +- 0.06% )
```
</details>
It looks like there's a small (<0.5%) performance decrease. This appears to be caused by a loop optimization keeping the intermediate shift result in a register and incrementing it along with the loop control variable, because it incorrectly thinks the shifts are more expensive than maintaining another loop variable. For example, this affects `matrix_mul_vect` in CoreMark. In the snippet below, `x10` and `x11` are completely unnecessary.
```
190: d377db8b lsl x11, x28, #9
194: d37ffb89 lsl x9, x28, #1
198: cb1c030c sub x12, x24, x28
19c: aa0303ed mov x13, x3
1a0: aa0203ee mov x14, x2
1a4: 8b02216a add x10, x11, x2, lsl #8
1a8: 8b03216b add x11, x11, x3, lsl #8
1ac: 934cfd4f asr x15, x10, #12 <<<<<<< could be "x15, x2, #4, #52"
1b0: 8b0e0130 add x16, x9, x14
1b4: 8b090040 add x0, x2, x9
1b8: d378fe10 lsr x16, x16, #56
1bc: 386f6a8f ldrb w15, [x20, x15]
1c0: 6b0f021f cmp w16, w15
1c4: 54000281 b.ne 214 <matrix_mul_vect+0x214> // b.any
1c8: 934cfd6f asr x15, x11, #12 <<<<<<< could be "x15, x3, #4, #52"
1cc: 8b0d0130 add x16, x9, x13
1d0: d378fe10 lsr x16, x16, #56
1d4: 386f6a91 ldrb w17, [x20, x15]
1d8: 79c0000f ldrsh w15, [x0]
1dc: 8b090060 add x0, x3, x9
1e0: 6b11021f cmp w16, w17
1e4: 540001c1 b.ne 21c <matrix_mul_vect+0x21c> // b.any
1e8: 79c00010 ldrsh w16, [x0]
1ec: 91000842 add x2, x2, #0x2
1f0: 9108014a add x10, x10, #0x200
1f4: 910009ce add x14, x14, #0x2
1f8: f100058c subs x12, x12, #0x1
1fc: 91000863 add x3, x3, #0x2
200: 1b0f2208 madd w8, w16, w15, w8
204: 9108016b add x11, x11, #0x200
208: 910009ad add x13, x13, #0x2
20c: 54fffd01 b.ne 1ac <matrix_mul_vect+0x1ac> // b.any
210: 17ffffd4 b 160 <matrix_mul_vect+0x160>
```
https://github.com/llvm/llvm-project/pull/103727
More information about the llvm-commits
mailing list