[PATCH] D12149: [AArch64] Turn on by default interleaved access vectorization

Mon Aug 24 09:57:02 PDT 2015

sbaranga added a comment.

Here are the spec2k and spec2k6 results (AArch64, Cortex-A57). There seems to be no significant change. This is probably a combination of workload types and the optimized functions not being 'hot'. The preferred workload here seems to be something like image-processing kernels (which explains why the optimization triggered a lot in the mesa benchmark).

SPEC2000

Size:

| Name    | Change(patched/original - 1) | Binary changed |
| gzip    | 0.07%                        | Y              |
| vpr     | 0.08%                        | Y              |
| gcc     | 0.21%                        | Y              |
| mesa    | 0.69%                        | Y              |
| art     | -0.04%                       | Y              |
| mcf     | 0                            | N              |
| equake  | 0                            | N              |
| crafty  | 0                            | Y              |
| ammp    | 0                            | Y              |
| parser  | 0                            | N              |
| eon     | 0.10%                        | Y              |
| perlbmk | 0                            | Y              |
| gap     | 0                            | N              |
| vortex  | 0                            | N              |
| bzip2   | 0                            | N              |
| twolf   | 0.01%                        | Y              |

Performance (only included result from changed binaries)
Negative numbers are improvements, positive numbers are regressions.

| Name                         | Execution time (patched/original – 1) |
| spec.cpu2000.ref.164_gzip    | -0.25%                                |
| spec.cpu2000.ref.175_vpr     | -0.55%                                |
| spec.cpu2000.ref.176_gcc     | -1.25%                                |
| spec.cpu2000.ref.177_mesa    | 0.40%                                 |
| spec.cpu2000.ref.179_art     | -1.04%                                |
| spec.cpu2000.ref.186_crafty  | 0.20%                                 |
| spec.cpu2000.ref.188_ammp    | 0.63%                                 |
| spec.cpu2000.ref.252_eon     | -0.48%                                |
| spec.cpu2000.ref.253_perlbmk | 0.86%                                 |
| spec.cpu2000.ref.300_twolf   | -1.18%                                |

Identified interleaved accesses in loops:

| Name    | Vectorized with IA | Vectorizable with IA and not profitable |
| gzip    | 3                  | 0                                       |
| vpr     | 1                  | 3                                       |
| gcc     | 9                  | 0                                       |
| mesa    | 39                 | 6                                       |
| art     | 1                  | 12                                      |
| crafty  | 5                  | 0                                       |
| ammp    | 1                  | 6                                       |
| eon     | 1                  | 0                                       |
| perlbmk | 3                  | 1                                       |
| twolf   | 1                  | 0                                       |

SPEC2006

Size:

| Name       | Changed (patched/original - 1) | Vectorized with IA | Vectorizable with IA but not profitable | Binary changed |
| perlbench  | 0                              | 4                  | 2                                       | Y              |
| bzip2      | 0                              | 0                  | 0                                       | N              |
| gcc        | 0                              | 5                  | 3                                       | Y              |
| mcf        | 0                              | 0                  | 0                                       | N              |
| milc       | 0                              | 0                  | 0                                       | N              |
| namd       | 0                              | 17                 | 0                                       | Y              |
| gobmk      | 0                              | 3                  | 6                                       | Y              |
| dealII     | 0.67%                          | 232                | 65                                      | Y              |
| soplex     | 0                              | 2                  | 19                                      | Y              |
| povray     | 0                              | 19                 | 15                                      | Y              |
| hmmer      | -0.01%                         | 1                  | 0                                       | Y              |
| sjeng      | 0                              | 0                  | 0                                       | N              |
| libquantum | 0.20%                          | 1                  | 2                                       | Y              |
| h264ref    | 0.07%                          | 3                  | 10                                      | Y              |
| lbm        | 0                              | 0                  | 0                                       | N              |
| omnetpp    | 0                              | 0                  | 0                                       | N              |
| astar      | 0                              | 0                  | 0                                       | N              |
| sphinx3    | 1.84%                          | 8                  | 1                                       | Y              |
| xalancbmk  | 0                              | 0                  | 3                                       | N              |

The large number of optimized loops in dealII comes from a stl function getting optimized (the same function essentially gets optimized multiple times)

Performance (only included result from changed binaries)

Negative numbers are improvements, positive numbers are regressions.

| Name                            | Patched/Original - 1 |
| spec.cpu2006.ref.400_perlbench  | 1.49%                |
| spec.cpu2006.ref.403_gcc        | 0.06%                |
| spec.cpu2006.ref.444_namd       | 0.23%                |
| spec.cpu2006.ref.445_gobmk      | 0.03%                |
| spec.cpu2006.ref.447_dealII     | -0.57%               |
| spec.cpu2006.ref.450_soplex     | 0.22%                |
| spec.cpu2006.ref.453_povray     | -1.15%               |
| spec.cpu2006.ref.456_hmmer      | 0.31%                |
| spec.cpu2006.ref.462_libquantum | 0.05%                |
| spec.cpu2006.ref.464_h264ref    | 0.39%                |
| spec.cpu2006.ref.482_sphinx3    | 1.74%                |

The spinx3 result seems to be a variation (it went away with further runs).

I'll post some compile-time results later on (probably using a bootstrap llvm build)

Thanks,
Silviu

http://reviews.llvm.org/D12149