[llvm] [RISCV][MC] Implement evaluateBranch for auipc+jalr pairs (PR #65480)

Mon Oct 2 11:03:37 PDT 2023

arichardson wrote:

> > In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)
> 
> Here's a quick benchmark (release build without asserts):
> 
> ```
> $ file clang
> clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped
> 
> $ size clang
>      text	   data	    bss	      dec	    hex	filename
> 149308819	4260800	 622672	154192291	930c9a3	clang
> 
> $ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
> Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
>   Time (mean ± σ):     56.003 s ±  0.206 s    [User: 32.285 s, System: 23.695 s]
>   Range (min … max):   55.849 s … 56.511 s    10 runs
> 
> Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
>   Time (mean ± σ):     56.651 s ±  0.071 s    [User: 32.911 s, System: 23.713 s]
>   Range (min … max):   56.550 s … 56.797 s    10 runs
> 
> Summary
>   ./llvm-objdump.main -d clang > /dev/null ran
>     1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null
> ```
> 
> So there seems to be about 1% overhead.
> 
> If this is too much, one solution would be to not store an array of `std::optional<uint64_t>` but one containing just `uint64_t` and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.

That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.

https://github.com/llvm/llvm-project/pull/65480