[PATCH] D46283: [AArch64] Set vectorizer-maximize-bandwidth as default true
Adhemerval Zanella via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Mon May 21 14:33:33 PDT 2018
zatrazz added a comment.
For some reason I did not attach the meant comments in this update. This is an update of the previous patch with an extended analysis. I checked a bootstrap build TargetTransformation::shouldMaximizeVectorBandwidth enabled for both armhf (r332595) and powerpc64le (r332840). On armhf I did not see any regression, however on powerpc64le I found an issue related on how current code handles the MaximizeBandwidth option. The testcase 'Transforms/LoopVectorize/PowerPC/pr30990.ll' explicit sets vectorizer-maximize-bandwidth to 0, however the code checks for:
- lib/Transforms/Vectorize/LoopVectorize.cpp
4970 unsigned MaxVF = MaxVectorSize;
4971 if (TTI.shouldMaximizeVectorBandwidth(OptForSize) ||
4972 (MaximizeBandwidth && !OptForSize)) {
To enable/disable this optimization. I think a possible fix would to check if Maximize is explicit disable (instead to check for its default value) by:
diff --git a/lib/Transforms/Vectorize/LoopVectorize.cpp b/lib/Transforms/Vectorize/LoopVectorize.cpp
index a65dc09..40c6583 100644
- a/lib/Transforms/Vectorize/LoopVectorize.cpp
+++ b/lib/Transforms/Vectorize/LoopVectorize.cpp
@@ -4968,7 +4968,8 @@ LoopVectorizationCostModel::computeFeasibleMaxVF(bool OptForSize,
}
unsigned MaxVF = MaxVectorSize;
- if (TTI.shouldMaximizeVectorBandwidth(OptForSize) ||
+ if (TTI.shouldMaximizeVectorBandwidth(OptForSize &&
+ !MaximizeBandwidth.getNumOccurrences()) ||
(MaximizeBandwidth && !OptForSize)) {
// Collect all viable vectorization factors larger than the default MaxVF
// (i.e. MaxVectorSize).
But I think it should not act a block for this patch (I can sent it in another one if required).
Now related to performance differences, for speccpu2006 I see mixed results The machine I am testing show some variance, and even trying to get as minimum OS jitter as possible (by placing cpu allocation and binding to a specific node and disabling OS services), in two runs I see:
- RUN 1 (1 iteration)
Benchmark Difference (%) *
400.perlbench 0.73
401.bzip2 0.01
403.gcc 0.53
429.mcf 0.05
445.gobmk 0.99
456.hmmer -0.25
458.sjeng 1.02
462.libquantum 0.04
464.h264ref 0.28
471.omnetpp 0.30
473.astar -0.11
483.xalancbmk 1.92
433.milc 0.03
444.namd -0.38
447.dealII 0.95
450.soplex 0.99
453.povray 1.24
470.lbm -0.88
482.sphinx3 1.43
- RUN 2 (3 iteration, get the better result)
Benchmark Difference (%) *
400.perlbench 0.66
401.bzip2 -2.84
403.gcc 0.09
429.mcf 0.46
445.gobmk 0.03
456.hmmer -1.34
458.sjeng 0.12
462.libquantum 0.06
464.h264ref 0.45
471.omnetpp -0.74
473.astar 1.05
483.xalancbmk -0.57
433.milc -0.04
444.namd 0.14
447.dealII -0.37
450.soplex 0.97
453.povray -0.90
470.lbm -0.88
482.sphinx3 0.28
On speccpu2017 the results are slighter more stable:
Benchmark Difference (%) *
600.perlbench_s 0,41
602.gcc_s 0,96
605.mcf_s -0,87
620.omnetpp_s 1,74
623.xalancbmk_s 1,80
625.x264_s 0,16
631.deepsjeng_s 0,33
641.leela_s 0,38
657.xz_s -0,14
619.lbm_s -0,45
638.imagick_s 0,09
644.nab_s -0,10
It also shows some performance improvements on geekbench5 (it was run by in another machine by John Brawn from ARM):
Benchmark Difference (%) *
AES 0,00
Camera 3,51
Canny 3,24
Clang 0,00
Dijkstra 0,00
FaceDetection 0,20
GaussianBlu 12,41
Grayscale 0,42
HDR 0,19
HTML5DOM -0,14
HTML5Parse 2,88
HistogramEqualization 0,24
JPEG 0,18
LLVM 0,23
LMZA 0,00
LensBlur 0,00
Lua -0,57
MemoryBandwidth -0,14
MemoryCopy 0,08
MemoryLatencyPageRandom 0,04
Nbody 0,15
PDFRendering -4,15
Particle 0,00
Raw 0,00
Raytrace -0,24
RigidBody -0,09
SFFT 0,46
SGEMM 0,00
SGEMMWithTaskQueue -0,28
SQLite 0,63
Sobel 0,00
SpeechRecognition 0,10
The only regression seems PDFRendering, which I am investigating.
- Difference between r332336 with and without the patch, positive values represents a higher score indicating a improvement with the patch.
https://reviews.llvm.org/D46283
More information about the llvm-commits
mailing list