[PATCH] D102522: [llvm-exegesis] Loop unrolling for loop snippet repetitor mode
Roman Lebedev via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Fri May 14 12:08:46 PDT 2021
lebedev.ri created this revision.
lebedev.ri added reviewers: courbet, gchatelet, RKSimon.
lebedev.ri added a project: LLVM.
Herald added subscribers: mstojanovic, pengfei.
lebedev.ri requested review of this revision.
I really needed this, like, factually, yesterday.
Consider the following example:
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=duplicate
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-4a7e50.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 0.31025, per_snippet_value: 0.31025 }
error: ''
info: ''
assembled_snippet: C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C5FDEFC0C3
...
What does it tell us?
So wait, it can only execute ~3 x86 AVX YMM PXOR zero-idioms per cycle?
That doesn't seem right. That's even less than there are pipes supporting this type of op.
Now, second example:
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-2418b5.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 1.00011, per_snippet_value: 1.00011 }
error: ''
info: ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
Now that's just worse. Due to the looping, the throughput completely collapsed,
and now we can only do a single instruction/cycle!?
That's not great.
And final example:
$ ./bin/llvm-exegesis --mode=inverse_throughput --snippets-file=/tmp/snippet.s --num-repetitions=1000000 --repetition-mode=loop --loop-unroll-factor=1000
Check generated assembly with: /usr/bin/objdump -d /tmp/snippet-c402e2.o
---
mode: inverse_throughput
key:
instructions:
- 'VPXORYrr YMM0 YMM0 YMM0'
config: ''
register_initial_values: []
cpu_name: znver3
llvm_triple: x86_64-unknown-linux-gnu
num_repetitions: 1000000
measurements:
- { key: inverse_throughput, value: 0.167087, per_snippet_value: 0.167087 }
error: ''
info: ''
assembled_snippet: 49B80800000000000000C5FDEFC0C5FDEFC04983C0FF75F2C3
...
So if we merge the previous two approaches, do duplicate this single-instruction snippet 1000x,
and run a loop with 1000 iterations over that duplicated/unrolled snippet,
the measured throughput goes through the roof, up to 5.9 instructions/cycle,
which finally tells us that this idiom is zero-cycle!
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D102522
Files:
llvm/docs/CommandGuide/llvm-exegesis.rst
llvm/tools/llvm-exegesis/lib/BenchmarkResult.h
llvm/tools/llvm-exegesis/lib/BenchmarkRunner.cpp
llvm/tools/llvm-exegesis/lib/BenchmarkRunner.h
llvm/tools/llvm-exegesis/lib/SnippetRepetitor.cpp
llvm/tools/llvm-exegesis/lib/SnippetRepetitor.h
llvm/tools/llvm-exegesis/llvm-exegesis.cpp
llvm/unittests/tools/llvm-exegesis/X86/SnippetRepetitorTest.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D102522.345524.patch
Type: text/x-patch
Size: 12230 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210514/f3595290/attachment.bin>
More information about the llvm-commits
mailing list