[llvm] [RISCV][MC] Implement evaluateBranch for auipc+jalr pairs (PR #65480)
Alexander Richardson via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 2 11:03:37 PDT 2023
arichardson wrote:
> > In general this looks good to me but I do worry a bit about the performance implications of regularly zeroing all the GPRS. Could you show the impact on disassembly of a large binary (e.g. statically linked clang for riscv?)
>
> Here's a quick benchmark (release build without asserts):
>
> ```
> $ file clang
> clang: ELF 64-bit LSB pie executable, UCB RISC-V, RVC, double-float ABI, version 1 (GNU/Linux), dynamically linked, interpreter /lib/ld-linux-riscv64-lp64d.so.1, BuildID[sha1]=a4645a5d30617084df5efea9662d984d0a9dc918, for GNU/Linux 4.15.0, not stripped
>
> $ size clang
> text data bss dec hex filename
> 149308819 4260800 622672 154192291 930c9a3 clang
>
> $ hyperfine --parameter-list which main,pr './llvm-objdump.{which} -d clang > /dev/null' --warmup 3
> Benchmark 1: ./llvm-objdump.main -d clang > /dev/null
> Time (mean ± σ): 56.003 s ± 0.206 s [User: 32.285 s, System: 23.695 s]
> Range (min … max): 55.849 s … 56.511 s 10 runs
>
> Benchmark 2: ./llvm-objdump.pr -d clang > /dev/null
> Time (mean ± σ): 56.651 s ± 0.071 s [User: 32.911 s, System: 23.713 s]
> Range (min … max): 56.550 s … 56.797 s 10 runs
>
> Summary
> ./llvm-objdump.main -d clang > /dev/null ran
> 1.01 ± 0.00 times faster than ./llvm-objdump.pr -d clang > /dev/null
> ```
>
> So there seems to be about 1% overhead.
>
> If this is too much, one solution would be to not store an array of `std::optional<uint64_t>` but one containing just `uint64_t` and a separate 32-bit bitmap. I suppose that would remove most of the overhead of clearing state.
That seems better than expected and might be acceptable. However, I think using a bitset that can be zeroed with a single store should bring it down to near zero and will not add much complexity.
https://github.com/llvm/llvm-project/pull/65480
More information about the llvm-commits
mailing list