[PATCH] D147267: [NFC][DebugInfo] Make Descriptions constexpr

Thu Apr 6 15:43:41 PDT 2023

scott.linder added a comment.

In D147267#4246564 <https://reviews.llvm.org/D147267#4246564>, @scott.linder wrote:

> In D147267#4241300 <https://reviews.llvm.org/D147267#4241300>, @dblaikie wrote:
>
>> Sure - though could you check if there's any revision history that explains the comment further/motivates why this would be valuable? (I have some apprehension about making extra things constexpr as it can increase compile time excessively (forcing the frontend to check for evaluation - when the backend can more cheaply optimize it away later anyway))
>
> Sure, I did just blindly assume the TODO had merit. Is there a structured way you tend to evaluate these on a case-by-case basis? Should I compare instructions retired while compiling LLVM+Clang and while using the resulting LLVM+Clang to inspect DebugInfo with and without the change?
>
> Edit:
>
> I compared building the `lib/DebugInfo/DWARF/all` target without the patch vs. with the patch, with some minimal isolation and attempts to make the result statistically significant without spending too much time:
>
> - I ran everything out of a linux `tmpfs`
> - I used ninja as the build system
> - I'm compiling with clang version 11.0.0 (https://github.com/llvm/llvm-project 176249bd6732a8044d457092ed932768724a6f06 <https://reviews.llvm.org/rG176249bd6732a8044d457092ed932768724a6f06>)
> - I ran the build six times, `clean`ing between each and throwing out the first run
>
>   <stat>: <control> <constexpr> (<constexpr>/<control>)
>   lib/libLLVMDebugInfoDWARF.a-size: 2021132 2016068 (0.997494)
>   instructions:u: 4271564407084 4271164683246 (0.999906)
>   Maximum resident set size: 694690 694689 (0.999999)
>   branches: 495120306752 495185537556 (1.000132)
>   branch-misses: 30440381987 30515429111 (1.002465)
>   User time: 2321.876000 2331.440000 (1.004119)
>
> If https://www.npopov.com/2020/05/10/Make-LLVM-fast-again.html is to be believed the user instructions retired and max-rss metrics are the best proxies available, but I included some of the "noisy" metrics too. None of them seem too outrageously out-of-line, but I can rerun to see if e.g. the 0.4% increase user-time is a fluke.
>
> I stole the `cmake`, `time`, and `perf` command-lines from the repo for http://llvm-compile-time-tracker.com/ referenced from the same blog post above. I haven't done much in this area so I don't know if I made any mistakes; my script is:
>
>   #!/bin/bash
>   
>   build() {
>     for f in control constexpr; do
>       rm -rf build-$f
>       mkdir -p build-$f
>       rm -rf out-$f
>       mkdir -p out-$f
>       cmake \
>         -GNinja \
>         -S./source-$f/llvm \
>         -B./build-$f \
>         -DLLVM_ENABLE_PROJECTS="clang" \
>         -DLLVM_TARGETS_TO_BUILD="X86" \
>         -DLLVM_BUILD_TOOLS=false \
>         -DLLVM_APPEND_VC_REV=false \
>         -DCMAKE_BUILD_TYPE=Release \
>         -DLLVM_CCACHE_BUILD=false \
>         -DLLVM_USE_LINKER=gold \
>         -DLLVM_BINUTILS_INCDIR=/usr/include \
>         -DCLANG_ENABLE_ARCMT=false \
>         -DCLANG_ENABLE_STATIC_ANALYZER=false
>       /usr/bin/ninja -C build-$f lib/DebugInfo/DWARF/all
>       for i in $(seq 5); do
>         LC_ALL=C ninja -C build-$f clean
>         LC_ALL=C \
>           /usr/bin/time -v -o out-$f/time.$i \
>           perf stat -x \; -o out-$f/perf.$i \
>             -e instructions \
>             -e instructions:u \
>             -e cycles \
>             -e task-clock \
>             -e branches \
>             -e branch-misses \
>             ninja -C build-$f lib/DebugInfo/DWARF/all
>       done
>     done
>   }
>   print_stat_header() {
>     printf "<stat>: <control> <constexpr> (<constexpr>/<control>)\n"
>   }
>   print_stat() {
>     printf "%s: %s %s (%s)\n" "$1" "$2" "$3" "$(awk -- "BEGIN { printf \"%f\\n\", $3 / $2 }")"
>   }
>   size() {
>     stat --printf='%s\n' "$@"
>   }
>   compare_size() {
>     local file="$1"
>     local control="$(size build-control/$file)"
>     local constexpr="$(size build-constexpr/$file)"
>     print_stat "$file-size" "$control" "$constexpr"
>   }
>   compare() {
>     local dataset="$1"; shift
>     local rowname="$1"; shift
>     local fmt="$1"; shift
>     local sep=:
>     local idx=2
>     if [[ $dataset == perf ]]; then
>       sep=\;
>       idx=1
>     fi
>     local awkprogram="/$rowname/ { sum += \$$idx; n++; } END { printf \"$fmt\\n\", sum / n; }"
>     local control=$(awk -F "$sep" -- "$awkprogram" out-control/$dataset.*)
>     local constexpr=$(awk -F "$sep" -- "$awkprogram" out-constexpr/$dataset.*)
>     local relative=$(awk -- "BEGIN { printf \"%f\\n\", $constexpr / $control }")
>     print_stat "$rowname" "$control" "$constexpr"
>   }
>   print() {
>     print_stat_header
>     compare_size lib/libLLVMDebugInfoDWARF.a
>     compare perf 'instructions:u' '%u'
>     compare time 'Maximum resident set size' '%u'
>     compare perf 'branches' '%u'
>     compare perf 'branch-misses' '%u'
>     compare time 'User time' '%f'
>   }
>   main() {
>     build
>     print
>   }
>   "$@"
>
> Run as `bench.sh main`
>
> I will expand this to building the whole of llvm, and then dogfood the result, although I may not run as many trials to avoid it taking too long.

The actual use of the DebugInfo code during a normal compilation is minimal: it is just used to "fixup" some operations whose final argument values depend on layout that occurs pretty late.

With that in mind, I figured a better test would be to build a debug build of LLVM trunk, and then run release builds of llvm-dwarfdump over those debug executables, comparing the performance metrics with and without the relevant patches applied.

I first started with dumping `clang`, but it takes quite a while to chew through all the debug info in a clang executable. I backed down to `llvm-dwarfdump` itself, and `llc`. The results seem to indicate the change to `constexpr` is essentially a wash at runtime:

  Switching to constexpr, running `release/llvm-dwarfdump debug/llvm-dwarfdump`:

  <stat>: <control> <patched> (<patched>/<control>)
  instructions:u: 223641285928 223732233232 (1.000407)
  Maximum resident set size: 698212 698365 (1.000219)
  branches: 22636320251 22435431611 (0.991125)
  branch-misses: 730543176 342250426 (0.468488)
  User time: 34.436667 28.920000 (0.839803)

  Switching to constexpr, running `release/llvm-dwarfdump debug/llc`:

  <stat>: <control> <patched> (<patched>/<control>)
  instructions:u: 1878329132612 1879189898727 (1.000458)
  Maximum resident set size: 5063790 5063321 (0.999907)
  branches: 191863771404 191266164346 (0.996885)
  branch-misses: 3513814256 2647395865 (0.753425)
  User time: 255.713333 242.900000 (0.949892)

With that in mind, and considering D147270 <https://reviews.llvm.org/D147270> would be simplified if we just dropped all the static bounds and used `SmallVector` everywhere, I also tried starting with trunk and only changing the `DWARFExpression::Operation::Op` member to be a `SmallVector<Element, 2>`, and it also appears to be a wash:

  Just changing the static array to a SmallVector<2>, running `release/llvm-dwarfdump debug/llvm-dwarfdump`:

  <stat>: <control> <patched> (<patched>/<control>)
  instructions:u: 223622266793 223954468175 (1.001486)
  Maximum resident set size: 698261 697617 (0.999078)
  branches: 22379620071 22552883328 (1.007742)
  branch-misses: 247854954 525422579 (2.119879)
  User time: 28.326667 32.413333 (1.144269)

  Just changing the static array to a SmallVector<2>, running `release/llvm-dwarfdump debug/llvm-llc`:

  <stat>: <control> <patched> (<patched>/<control>)
  instructions:u: 1878037338324 1880998221933 (1.001577)
  Maximum resident set size: 5063801 5065104 (1.000257)
  branches: 191989624604 191678199871 (0.998378)
  branch-misses: 3458085543 2997545838 (0.866822)
  User time: 252.236667 250.806667 (0.994331)

Assuming I've actually measured that there are no significant differences in instructions retired or max-rss (and that the remaining code changes needed to support a dynamic number of operands don't change things significantly) it seems like the more maintainable approach of using a `SmallVector` for the operands and not forcing things into `constexpr` boxes would be preferable.

Thoughts?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D147267/new/

https://reviews.llvm.org/D147267