[llvm] [RISCV][MC] Implement evaluateBranch for auipc+jalr pairs (PR #65480)

Mon Oct 2 10:57:25 PDT 2023

mtvec wrote:

> In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)

Here's a quick benchmark (release build without asserts):

```
$ file clang
clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped

$ size clang
     text	   data	    bss	      dec	    hex	filename
149308819	4260800	 622672	154192291	930c9a3	clang

$ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
  Time (mean ± σ):     56.003 s ±  0.206 s    [User: 32.285 s, System: 23.695 s]
  Range (min … max):   55.849 s … 56.511 s    10 runs

Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
  Time (mean ± σ):     56.651 s ±  0.071 s    [User: 32.911 s, System: 23.713 s]
  Range (min … max):   56.550 s … 56.797 s    10 runs

Summary
  ./llvm-objdump.main -d clang > /dev/null ran
    1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null
```

So there seems to be about 1% overhead.

If this is too much, one solution would be to not store an array of `std::optional<uint64_t>` but one containing just `uint64_t` and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.

https://github.com/llvm/llvm-project/pull/65480