[PATCH] D147267: [NFC][DebugInfo] Make Descriptions constexpr
Scott Linder via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Thu Apr 6 15:43:41 PDT 2023
scott.linder added a comment.
In D147267#4246564 <https://reviews.llvm.org/D147267#4246564>, @scott.linder wrote:
> In D147267#4241300 <https://reviews.llvm.org/D147267#4241300>, @dblaikie wrote:
>
>> Sure - though could you check if there's any revision history that explains the comment further/motivates why this would be valuable? (I have some apprehension about making extra things constexpr as it can increase compile time excessively (forcing the frontend to check for evaluation - when the backend can more cheaply optimize it away later anyway))
>
> Sure, I did just blindly assume the TODO had merit. Is there a structured way you tend to evaluate these on a case-by-case basis? Should I compare instructions retired while compiling LLVM+Clang and while using the resulting LLVM+Clang to inspect DebugInfo with and without the change?
>
> Edit:
>
> I compared building the `lib/DebugInfo/DWARF/all` target without the patch vs. with the patch, with some minimal isolation and attempts to make the result statistically significant without spending too much time:
>
> - I ran everything out of a linux `tmpfs`
> - I used ninja as the build system
> - I'm compiling with clang version 11.0.0 (https://github.com/llvm/llvm-project 176249bd6732a8044d457092ed932768724a6f06 <https://reviews.llvm.org/rG176249bd6732a8044d457092ed932768724a6f06>)
> - I ran the build six times, `clean`ing between each and throwing out the first run
>
> <stat>: <control> <constexpr> (<constexpr>/<control>)
> lib/libLLVMDebugInfoDWARF.a-size: 2021132 2016068 (0.997494)
> instructions:u: 4271564407084 4271164683246 (0.999906)
> Maximum resident set size: 694690 694689 (0.999999)
> branches: 495120306752 495185537556 (1.000132)
> branch-misses: 30440381987 30515429111 (1.002465)
> User time: 2321.876000 2331.440000 (1.004119)
>
> If https://www.npopov.com/2020/05/10/Make-LLVM-fast-again.html is to be believed the user instructions retired and max-rss metrics are the best proxies available, but I included some of the "noisy" metrics too. None of them seem too outrageously out-of-line, but I can rerun to see if e.g. the 0.4% increase user-time is a fluke.
>
> I stole the `cmake`, `time`, and `perf` command-lines from the repo for http://llvm-compile-time-tracker.com/ referenced from the same blog post above. I haven't done much in this area so I don't know if I made any mistakes; my script is:
>
> #!/bin/bash
>
> build() {
> for f in control constexpr; do
> rm -rf build-$f
> mkdir -p build-$f
> rm -rf out-$f
> mkdir -p out-$f
> cmake \
> -GNinja \
> -S./source-$f/llvm \
> -B./build-$f \
> -DLLVM_ENABLE_PROJECTS="clang" \
> -DLLVM_TARGETS_TO_BUILD="X86" \
> -DLLVM_BUILD_TOOLS=false \
> -DLLVM_APPEND_VC_REV=false \
> -DCMAKE_BUILD_TYPE=Release \
> -DLLVM_CCACHE_BUILD=false \
> -DLLVM_USE_LINKER=gold \
> -DLLVM_BINUTILS_INCDIR=/usr/include \
> -DCLANG_ENABLE_ARCMT=false \
> -DCLANG_ENABLE_STATIC_ANALYZER=false
> /usr/bin/ninja -C build-$f lib/DebugInfo/DWARF/all
> for i in $(seq 5); do
> LC_ALL=C ninja -C build-$f clean
> LC_ALL=C \
> /usr/bin/time -v -o out-$f/time.$i \
> perf stat -x \; -o out-$f/perf.$i \
> -e instructions \
> -e instructions:u \
> -e cycles \
> -e task-clock \
> -e branches \
> -e branch-misses \
> ninja -C build-$f lib/DebugInfo/DWARF/all
> done
> done
> }
> print_stat_header() {
> printf "<stat>: <control> <constexpr> (<constexpr>/<control>)\n"
> }
> print_stat() {
> printf "%s: %s %s (%s)\n" "$1" "$2" "$3" "$(awk -- "BEGIN { printf \"%f\\n\", $3 / $2 }")"
> }
> size() {
> stat --printf='%s\n' "$@"
> }
> compare_size() {
> local file="$1"
> local control="$(size build-control/$file)"
> local constexpr="$(size build-constexpr/$file)"
> print_stat "$file-size" "$control" "$constexpr"
> }
> compare() {
> local dataset="$1"; shift
> local rowname="$1"; shift
> local fmt="$1"; shift
> local sep=:
> local idx=2
> if [[ $dataset == perf ]]; then
> sep=\;
> idx=1
> fi
> local awkprogram="/$rowname/ { sum += \$$idx; n++; } END { printf \"$fmt\\n\", sum / n; }"
> local control=$(awk -F "$sep" -- "$awkprogram" out-control/$dataset.*)
> local constexpr=$(awk -F "$sep" -- "$awkprogram" out-constexpr/$dataset.*)
> local relative=$(awk -- "BEGIN { printf \"%f\\n\", $constexpr / $control }")
> print_stat "$rowname" "$control" "$constexpr"
> }
> print() {
> print_stat_header
> compare_size lib/libLLVMDebugInfoDWARF.a
> compare perf 'instructions:u' '%u'
> compare time 'Maximum resident set size' '%u'
> compare perf 'branches' '%u'
> compare perf 'branch-misses' '%u'
> compare time 'User time' '%f'
> }
> main() {
> build
> print
> }
> "$@"
>
> Run as `bench.sh main`
>
> I will expand this to building the whole of llvm, and then dogfood the result, although I may not run as many trials to avoid it taking too long.
The actual use of the DebugInfo code during a normal compilation is minimal: it is just used to "fixup" some operations whose final argument values depend on layout that occurs pretty late.
With that in mind, I figured a better test would be to build a debug build of LLVM trunk, and then run release builds of llvm-dwarfdump over those debug executables, comparing the performance metrics with and without the relevant patches applied.
I first started with dumping `clang`, but it takes quite a while to chew through all the debug info in a clang executable. I backed down to `llvm-dwarfdump` itself, and `llc`. The results seem to indicate the change to `constexpr` is essentially a wash at runtime:
Switching to constexpr, running `release/llvm-dwarfdump debug/llvm-dwarfdump`:
<stat>: <control> <patched> (<patched>/<control>)
instructions:u: 223641285928 223732233232 (1.000407)
Maximum resident set size: 698212 698365 (1.000219)
branches: 22636320251 22435431611 (0.991125)
branch-misses: 730543176 342250426 (0.468488)
User time: 34.436667 28.920000 (0.839803)
Switching to constexpr, running `release/llvm-dwarfdump debug/llc`:
<stat>: <control> <patched> (<patched>/<control>)
instructions:u: 1878329132612 1879189898727 (1.000458)
Maximum resident set size: 5063790 5063321 (0.999907)
branches: 191863771404 191266164346 (0.996885)
branch-misses: 3513814256 2647395865 (0.753425)
User time: 255.713333 242.900000 (0.949892)
With that in mind, and considering D147270 <https://reviews.llvm.org/D147270> would be simplified if we just dropped all the static bounds and used `SmallVector` everywhere, I also tried starting with trunk and only changing the `DWARFExpression::Operation::Op` member to be a `SmallVector<Element, 2>`, and it also appears to be a wash:
Just changing the static array to a SmallVector<2>, running `release/llvm-dwarfdump debug/llvm-dwarfdump`:
<stat>: <control> <patched> (<patched>/<control>)
instructions:u: 223622266793 223954468175 (1.001486)
Maximum resident set size: 698261 697617 (0.999078)
branches: 22379620071 22552883328 (1.007742)
branch-misses: 247854954 525422579 (2.119879)
User time: 28.326667 32.413333 (1.144269)
Just changing the static array to a SmallVector<2>, running `release/llvm-dwarfdump debug/llvm-llc`:
<stat>: <control> <patched> (<patched>/<control>)
instructions:u: 1878037338324 1880998221933 (1.001577)
Maximum resident set size: 5063801 5065104 (1.000257)
branches: 191989624604 191678199871 (0.998378)
branch-misses: 3458085543 2997545838 (0.866822)
User time: 252.236667 250.806667 (0.994331)
Assuming I've actually measured that there are no significant differences in instructions retired or max-rss (and that the remaining code changes needed to support a dynamic number of operands don't change things significantly) it seems like the more maintainable approach of using a `SmallVector` for the operands and not forcing things into `constexpr` boxes would be preferable.
Thoughts?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D147267/new/
https://reviews.llvm.org/D147267
More information about the llvm-commits
mailing list