[PATCH] D12149: [AArch64] Turn on by default interleaved access vectorization
silviu.baranga@arm.com via llvm-commits
llvm-commits at lists.llvm.org
Mon Aug 24 09:57:02 PDT 2015
sbaranga added a comment.
Here are the spec2k and spec2k6 results (AArch64, Cortex-A57). There seems to be no significant change. This is probably a combination of workload types and the optimized functions not being 'hot'. The preferred workload here seems to be something like image-processing kernels (which explains why the optimization triggered a lot in the mesa benchmark).
SPEC2000
Size:
| Name | Change(patched/original - 1) | Binary changed |
| gzip | 0.07% | Y |
| vpr | 0.08% | Y |
| gcc | 0.21% | Y |
| mesa | 0.69% | Y |
| art | -0.04% | Y |
| mcf | 0 | N |
| equake | 0 | N |
| crafty | 0 | Y |
| ammp | 0 | Y |
| parser | 0 | N |
| eon | 0.10% | Y |
| perlbmk | 0 | Y |
| gap | 0 | N |
| vortex | 0 | N |
| bzip2 | 0 | N |
| twolf | 0.01% | Y |
Performance (only included result from changed binaries)
Negative numbers are improvements, positive numbers are regressions.
| Name | Execution time (patched/original – 1) |
| spec.cpu2000.ref.164_gzip | -0.25% |
| spec.cpu2000.ref.175_vpr | -0.55% |
| spec.cpu2000.ref.176_gcc | -1.25% |
| spec.cpu2000.ref.177_mesa | 0.40% |
| spec.cpu2000.ref.179_art | -1.04% |
| spec.cpu2000.ref.186_crafty | 0.20% |
| spec.cpu2000.ref.188_ammp | 0.63% |
| spec.cpu2000.ref.252_eon | -0.48% |
| spec.cpu2000.ref.253_perlbmk | 0.86% |
| spec.cpu2000.ref.300_twolf | -1.18% |
Identified interleaved accesses in loops:
| Name | Vectorized with IA | Vectorizable with IA and not profitable |
| gzip | 3 | 0 |
| vpr | 1 | 3 |
| gcc | 9 | 0 |
| mesa | 39 | 6 |
| art | 1 | 12 |
| crafty | 5 | 0 |
| ammp | 1 | 6 |
| eon | 1 | 0 |
| perlbmk | 3 | 1 |
| twolf | 1 | 0 |
SPEC2006
Size:
| Name | Changed (patched/original - 1) | Vectorized with IA | Vectorizable with IA but not profitable | Binary changed |
| perlbench | 0 | 4 | 2 | Y |
| bzip2 | 0 | 0 | 0 | N |
| gcc | 0 | 5 | 3 | Y |
| mcf | 0 | 0 | 0 | N |
| milc | 0 | 0 | 0 | N |
| namd | 0 | 17 | 0 | Y |
| gobmk | 0 | 3 | 6 | Y |
| dealII | 0.67% | 232 | 65 | Y |
| soplex | 0 | 2 | 19 | Y |
| povray | 0 | 19 | 15 | Y |
| hmmer | -0.01% | 1 | 0 | Y |
| sjeng | 0 | 0 | 0 | N |
| libquantum | 0.20% | 1 | 2 | Y |
| h264ref | 0.07% | 3 | 10 | Y |
| lbm | 0 | 0 | 0 | N |
| omnetpp | 0 | 0 | 0 | N |
| astar | 0 | 0 | 0 | N |
| sphinx3 | 1.84% | 8 | 1 | Y |
| xalancbmk | 0 | 0 | 3 | N |
The large number of optimized loops in dealII comes from a stl function getting optimized (the same function essentially gets optimized multiple times)
Performance (only included result from changed binaries)
Negative numbers are improvements, positive numbers are regressions.
| Name | Patched/Original - 1 |
| spec.cpu2006.ref.400_perlbench | 1.49% |
| spec.cpu2006.ref.403_gcc | 0.06% |
| spec.cpu2006.ref.444_namd | 0.23% |
| spec.cpu2006.ref.445_gobmk | 0.03% |
| spec.cpu2006.ref.447_dealII | -0.57% |
| spec.cpu2006.ref.450_soplex | 0.22% |
| spec.cpu2006.ref.453_povray | -1.15% |
| spec.cpu2006.ref.456_hmmer | 0.31% |
| spec.cpu2006.ref.462_libquantum | 0.05% |
| spec.cpu2006.ref.464_h264ref | 0.39% |
| spec.cpu2006.ref.482_sphinx3 | 1.74% |
The spinx3 result seems to be a variation (it went away with further runs).
I'll post some compile-time results later on (probably using a bootstrap llvm build)
Thanks,
Silviu
http://reviews.llvm.org/D12149
More information about the llvm-commits
mailing list