[PATCH] D10950: [SLPVectorizer] Try different vectorization factors and set max vector register size based on target

Sanjay Patel spatel at rotateright.com
Tue Jul 7 09:22:53 PDT 2015


I ran the benchmarking subset of test-suite on and targeting an AMD Jaguar system (has AVX, so 256-bit SLP was activated).

1. The 256-bit vector store optimization fired 51 times on 16 different tests, so looking for the wider vector does appear to be a useful thing to do based on this sample of tests.
2. There's no measurable perf difference on any of those tests; this is somewhat expected given that Jaguar has a double-pumped AVX implementation (128-bit data paths).
3. The sum of average CC_Time (3 trials) for the tests was 121.21 seconds for the baseline (only check for 128-bit SLP) and 121.34 seconds for the new version (looks for 256-bit SLP before 128-bit), so about 0.1% longer, but that difference is in the noise for the compile times in this data set.


http://reviews.llvm.org/D10950







More information about the llvm-commits mailing list