[PATCH] D69295: Optimize SHA1 implementation
Nick Terrell via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon Oct 21 20:50:24 PDT 2019
terrelln created this revision.
terrelln added reviewers: ruiu, MaskRay.
Herald added subscribers: hiraditya, mgorny.
Herald added a project: LLVM.
terrelln edited the summary of this revision.
terrelln edited the summary of this revision.
terrelln edited the summary of this revision.
- Add inline to the helper functions because gcc-9 won't inline all of them without the hint. I've avoided `__attribute__((always_inline))` because gcc and clang will inline without it, and improves compatibility.
- Replace the byte-by-byte copy in update() with endian::readbe32() since perf reports that 1/2 of the time is spent copying into the buffer before this patch.
- Add a hash-benchmark to measure the performance improvement.
When lld uses --build-id=sha1 it spends 30-45% of CPU in SHA1 depending on the binary (not wall-time since it is parallel). This patch speeds up SHA1 by a factor of 2 on clang-8 and 3 on gcc-6. This leads to a >10% improvement in overall linking time.
Unit tests
==========
ninja check-llvm
LLD speed
=========
lld-speed-test benchmarks run on an Intel i9-9900k with Turbo disabled on CPU 0 compiled with clang-9. Stats recorded with `perf stat -r 5`. All inputs are using `--build-id=sha1`.
| Input | Before (seconds) | After (seconds) |
| --------------- | ---------------- | --------------- |
| chrome | 2.14 | 1.82 (-15%) |
| chrome-icf | 2.56 | 2.29 (-10%) |
| clang | 0.65 | 0.53 (-18%) |
| clang-fsds | 0.69 | 0.58 (-16%) |
| clang-gdb-index | 21.71 | 19.3 (-11%) |
| gold | 0.42 | 0.34 (-19%) |
| gold-fsds | 0.431 | 0.355 (-17%) |
| linux-kernel | 0.625 | 0.575 (-8%) |
| llvm-as | 0.045 | 0.039 (-14%) |
| llvm-as-fsds | 0.035 | 0.039 (-11%) |
| mozilla | 11.3 | 9.8 (-13%) |
| mozilla-gc | 11.84 | 10.36 (-12%) |
| mozilla-O0 | 8.2 | 5.84 (-28%) |
| scylla | 5.59 | 4.52 (-19%) |
|
Microbenchmarks
===============
Compiled with clang-8:
Before:
2019-10-16 11:33:41
Running ./benchmarks/hash-benchmark/hash-benchmark
Run on (24 X 2394.48 MHz CPU s)
CPU Caches:
L1 Data 32K (x24)
L1 Instruction 32K (x24)
L2 Unified 4096K (x24)
L3 Unified 16384K (x24)
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_SHA1/1024 5146 ns 5145 ns 137203
BM_SHA1/4096 20043 ns 20040 ns 32644
BM_SHA1/32768 154810 ns 154803 ns 4401
BM_SHA1/262144 1281332 ns 1281244 ns 555
BM_SHA1/1048576 5154688 ns 5154100 ns 137
After:
2019-10-16 11:34:20
Running ./benchmarks/hash-benchmark/hash-benchmark
Run on (24 X 2394.48 MHz CPU s)
CPU Caches:
L1 Data 32K (x24)
L1 Instruction 32K (x24)
L2 Unified 4096K (x24)
L3 Unified 16384K (x24)
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_SHA1/1024 3071 ns 3070 ns 241890
BM_SHA1/4096 10491 ns 10491 ns 64873
BM_SHA1/32768 82802 ns 82791 ns 8533
BM_SHA1/262144 685598 ns 685595 ns 1069
BM_SHA1/1048576 2593819 ns 2593495 ns 265
Compiled with gcc-6:
Before:
2019-10-16 11:36:05
Running ./benchmarks/hash-benchmark/hash-benchmark
Run on (24 X 2394.48 MHz CPU s)
CPU Caches:
L1 Data 32K (x24)
L1 Instruction 32K (x24)
L2 Unified 4096K (x24)
L3 Unified 16384K (x24)
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_SHA1/1024 8770 ns 8769 ns 80651
BM_SHA1/4096 34161 ns 34159 ns 20583
BM_SHA1/32768 271183 ns 271154 ns 2565
BM_SHA1/262144 2140979 ns 2140434 ns 332
BM_SHA1/1048576 8376018 ns 8374622 ns 83
After:
2019-10-16 11:34:58
Running ./benchmarks/hash-benchmark/hash-benchmark
Run on (24 X 2394.48 MHz CPU s)
CPU Caches:
L1 Data 32K (x24)
L1 Instruction 32K (x24)
L2 Unified 4096K (x24)
L3 Unified 16384K (x24)
-----------------------------------------------------------
Benchmark Time CPU Iterations
-----------------------------------------------------------
BM_SHA1/1024 2892 ns 2892 ns 254677
BM_SHA1/4096 10300 ns 10299 ns 72058
BM_SHA1/32768 82527 ns 82527 ns 8880
BM_SHA1/262144 629433 ns 629358 ns 1080
BM_SHA1/1048576 2669301 ns 2669137 ns 272
Repository:
rG LLVM Github Monorepo
https://reviews.llvm.org/D69295
Files:
llvm/benchmarks/CMakeLists.txt
llvm/benchmarks/hash-benchmark/CMakeLists.txt
llvm/benchmarks/hash-benchmark/hash-benchmark.cpp
llvm/lib/Support/SHA1.cpp
llvm/unittests/Support/raw_sha1_ostream_test.cpp
-------------- next part --------------
A non-text attachment was scrubbed...
Name: D69295.225991.patch
Type: text/x-patch
Size: 7539 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20191022/09cd4a16/attachment.bin>
More information about the llvm-commits
mailing list