[llvm-commits] [PATCH] BasicBlock Autovectorization Pass

Hal Finkel hfinkel at anl.gov
Tue Nov 1 11:32:36 PDT 2011


Any objections to me committing this? [And some relevant docs changes] I
think that it is ready at this point.

Thanks in advance,
Hal

On Mon, 2011-10-31 at 19:50 -0500, Hal Finkel wrote:
> I've attached the latest version of my autovectorization patch. This
> version is significantly faster (in compile time) than the version I
> posted a couple of days ago, and generally produces better output.
> 
> At this point, next steps in enhancing the vectorization include:
> 1. Add an add/sub and/or alternating-negation vector intrinsic to
> provide for generating add-subtract, and more generally, asymmetric fma
> instructions.
> 2. Make -vectorize imply -unroll-allow-partial [Is there an easy way to
> do this?]
> 3. Add a -fvectorize flag to clang along the same lines.
> 
> Updated vectorization benchmark:
> Loop       llvm-v   llvm     gcc-v    gcc
> -------------------------------------------
> S000        9.00     9.59     4.55    10.04
> S111        7.25     7.65     7.68     7.83
> S1111      13.63    14.72    16.14    16.30
> S112       16.60    17.45    16.54    17.52
> S1112      12.99    13.87    14.83    14.84
> S113       22.03    22.98    22.05    22.05
> S1113      11.01    11.48    11.03    11.01
> S114       13.14    13.81    13.53    13.48
> S115       32.92    33.36    49.98    49.99
> S1115      13.61    14.23    13.65    13.66
> S116       46.90    49.43    49.54    48.11
> S118       10.76    11.25    10.79    10.50
> S119        8.68     9.09    11.83    11.82
> S1119       8.75     9.15     4.31    11.87
> S121       17.17    18.06    14.84    17.31
> S122        7.53     7.70     6.11     6.11
> S123        6.92     7.10     7.42     7.41
> S124        9.60     9.84     9.42     9.33
> S125        6.89     7.10     4.67     7.81
> S126        2.33     2.55     2.57     2.37
> S127       12.18    12.68     7.06    14.50
> S128       11.66    12.41    12.42    11.52
> S131       28.59    30.11    25.17    28.94
> S132       17.04    17.04    15.53    21.03
> S141       12.18    12.85    12.38    12.05
> S151       28.61    30.11    24.89    28.95
> S152       15.47    16.03    11.19    15.63
> S161        6.00     6.12     5.52     5.46
> S1161      14.40    14.50     8.80     8.79
> S162        8.18     8.41     5.36     8.18
> S171       14.05     7.96     2.81     5.70
> S172        5.67     5.97     2.75     5.70
> S173       30.17    31.69    18.15    30.13
> S174       30.12    31.53    18.51    30.16
> S175        5.75     6.04     4.94     5.77
> S176        5.57     5.83     4.41     7.65
> S211       16.23    16.89    16.82    16.38
> S212       13.19    13.50    13.34    13.18
> S1213      12.83    13.35    12.80    12.43
> S221       10.86    11.09     8.65     8.63
> S1221       5.71     6.03     5.40     6.05
> S222        6.00     6.29     5.70     5.72
> S231       22.23    24.22    22.36    22.11
> S232        6.89     6.94     6.89     6.89
> S1232      15.23    16.43    15.05    15.10
> S233       55.17    59.98    54.21    49.56
> S2233      27.07    29.71    29.68    28.40
> S235       43.79    47.85    46.94    43.93
> S241       31.00    31.72    32.53    31.01
> S242        7.20     7.21     7.20     7.20
> S243       16.48    16.99    17.69    16.84
> S244       14.47    14.93    16.91    16.82
> S1244      14.75    15.02    14.77    14.40
> S2244       9.97    10.60    10.40    10.06
> S251       34.20    35.55    19.70    34.38
> S1251      55.09    57.11    41.77    56.11
> S2251      15.64    16.26    17.02    15.70
> S3251      15.55    16.52    19.60    15.34
> S252        6.14     6.46     7.72     7.26
> S253       11.18    11.52    14.40    14.40
> S254       17.72    18.98    28.23    28.06
> S255        5.93     6.14     9.96     9.95
> S256        3.06     3.39     3.10     3.09
> S257        2.12     2.31     2.21     2.20
> S258        1.79     1.87     1.84     1.84
> S261       12.01    12.22    10.98    10.95
> S271       32.76    33.76    33.25    33.01
> S272       14.93    15.52    15.39    15.26
> S273       13.92    14.10    16.86    16.80
> S274       17.77    18.53    18.15    17.89
> S275        2.90     3.14     3.36     2.98
> S2275      32.65    34.95     8.97    33.60
> S276       41.38    41.97    40.80    40.55
> S277        4.81     4.93     4.81     4.80
> S278       14.41    14.76    14.70    14.66
> S279        8.04     8.24     7.25     7.27
> S1279       9.71     9.92     9.34     9.25
> S2710       7.68     8.07     7.86     7.56
> S2711      35.53    37.10    36.56    36.00
> S2712      32.91    33.96    34.24    33.47
> S281       10.75    11.32    12.46    12.02
> S1281     104.13    78.11    57.78    68.06
> S291       11.75    12.27    14.03    14.03
> S292        6.70     6.91     9.94     9.96
> S293       15.38    16.24    19.32    19.33
> S2101       2.50     2.67     2.59     2.60
> S2102      16.56    18.45    16.68    16.75
> S2111       5.59     5.63     5.85     5.85
> S311       72.04    72.27    72.23    72.03
> S31111      6.37     6.01     6.00     6.00
> S312       96.04    96.17    96.05    96.03
> S313       36.03    36.61    36.03    36.02
> S314       36.02    36.12    74.67    72.42
> S315        9.11     9.21     9.35     9.30
> S316       36.02    36.12    72.08    74.87
> S317      444.96   444.94   451.82   451.78
> S318        9.07     9.12     7.30     7.30
> S319       34.49    36.46    34.42    34.19
> S3110       8.53     8.61     4.11     4.11
> S13110      5.75     5.78    12.12    12.12
> S3111       3.60     3.64     3.60     3.60
> S3112       7.21     7.30     7.21     7.20
> S3113      33.68    34.18    60.21    60.20
> S321       16.80    16.87    16.80    16.80
> S322       12.42    12.64    12.60    12.60
> S323       10.89    11.24     8.48     8.51
> S331        4.23     4.36     7.20     7.20
> S332        7.21     7.28     5.21     5.31
> S341        4.76     5.04     7.23     7.20
> S342        6.02     6.24     7.25     7.20
> S343        2.02     2.16     2.16     2.01
> S351       46.33    48.65    21.82    46.46
> S1351      49.07    51.28    33.68    49.06
> S352       57.65    58.44    57.68    57.64
> S353        8.19     8.44     8.34     8.19
> S421       24.17    25.29    20.62    22.46
> S1421      25.09    26.16    15.85    24.76
> S422       79.95    81.51    79.22    78.99
> S423      154.93   155.21   154.56   154.38
> S424       22.61    23.35    11.42    22.36
> S431       56.88    59.82    27.59    57.16
> S441       14.05    14.23    12.88    12.81
> S442        5.99     6.13     6.96     6.90
> S443       17.33    17.77    17.15    16.95
> S451       48.94    48.99    49.03    49.14
> S452       43.01    39.57    14.64    96.03
> S453       28.07    28.07    14.60    14.40
> S471        8.20     8.56     8.39     8.43
> S481       10.89    11.23    12.04    12.00
> S482        9.20     9.42     9.19     9.17
> S491       11.25    11.60    11.37    11.28
> S4112       8.20     8.45     9.13     8.94
> S4113       8.64     8.95     8.86     8.85
> S4114      11.82    12.35    12.18    11.77
> S4115       8.27     8.51     8.95     8.59
> S4116       3.22     3.22     6.02     5.94
> S4117      13.96     9.69    10.16     9.98
> S4121       8.19     8.44     4.04     8.17
> va         28.39    29.33    23.58    48.46
> vag        12.26    12.93    13.58    13.20
> vas        13.36    14.15    13.03    12.47
> vif         4.50     4.79     5.06     4.92
> vpv        56.84    59.83    28.28    57.24
> vtv        57.58    60.42    28.40    57.63
> vpvtv      32.78    33.77    16.35    32.73
> vpvts       5.78     6.07     2.99     6.38
> vpvpv      32.78    33.84    16.54    32.85
> vtvtv      32.76    33.75    16.84    35.97
> vsumr      72.04    72.28    72.20    72.04
> vdotr      72.05    73.22    72.42    72.04
> vbor      227.55   381.18    99.80   372.05
> 
>  -Hal
> 
> On Sat, 2011-10-29 at 17:56 -0500, Hal Finkel wrote:
> > On Sat, 2011-10-29 at 15:16 -0500, Hal Finkel wrote:
> > > On Sat, 2011-10-29 at 14:02 -0500, Hal Finkel wrote:
> > > > On Sat, 2011-10-29 at 12:30 -0500, Hal Finkel wrote:
> > > > > Ralf, et al.,
> > > > > 
> > > > > Attached is the latest version of my autovectorization patch. llvmdev
> > > > > has been CC'd (as had been suggested to me); this e-mail contains
> > > > > additional benchmark results.
> > > > > 
> > > > > First, these are preliminary results because I did not do the things
> > > > > necessary to make them real (explicitly quiet the machine, bind the
> > > > > processes to one cpu, etc.). But they should be good enough for
> > > > > discussion.
> > > > > 
> > > > > I'm using LLVM head r143101, with the attached patch applied, and clang
> > > > > head r143100 on an x86_64 machine (some kind of Intel Xeon). For the gcc
> > > > > comparison, I'm using build Ubuntu 4.4.3-4ubuntu5. gcc was run -O3
> > > > > without any other optimization flags. opt was run -vectorize
> > > > > -unroll-allow-partial -O3 with no other optimization flags (the patch
> > > > > adds the -vectorize option).
> > > > 
> > > > And opt had also been given the flag: -bb-vectorize-vector-bits=256
> > > 
> > > And this was a mistake (because the machine on which the benchmarks were
> > > run does not have AVX). I've rerun, see better results below...
> > > 
> > > > 
> > > >  -Hal
> > > > 
> > > > > llc was just given -O3.
> > > > > 
> > > > > Below I've included results using the benchmark program by Maleki, et
> > > > > al. See:
> > > > > An Evaluation of Vectorizing Compilers - PACT'11
> > > > > (http://polaris.cs.uiuc.edu/~garzaran/doc/pact11.pdf). The source of
> > > > > their benchmark program was retrieved from:
> > > > > http://polaris.cs.uiuc.edu/~maleki1/TSVC.tar.gz
> > > > > 
> > > > > Also, when using clang, I had to pass -Dinline= on the command line:
> > > > > when using -emit-llvm, clang appears not to emit code for functions
> > > > > declared inline. This is a bug, but I've not yet tracked it down. There
> > > > > are two such small functions in the benchmark program, and the regular
> > > > > inliner *should* catch them anyway.
> > > > > 
> > > > > Results:
> > > > > 0. Name of the loop
> > > > > 1. Time using LLVM with vectorization
> > > > > 2. Time using LLVM without vectorization
> > > > > 3. Time using gcc with vectorization
> > > > > 4. Time using gcc without vectorization
> > 
> > As Peter Collingbourne indirectly pointed out to me, clang's
> > optimizations are still important (even if it is emitting only llvm).
> > I've rerun the llvm code generation steps, adding -O3 to clang. Here are
> > the results (they are significantly better):
> > 
> > Loop       llvm-v   llvm     gcc-v    gcc
> > -------------------------------------------
> > S000        9.10     9.59     4.55    10.04
> > S111        7.29     7.65     7.68     7.83
> > S1111      13.87    14.72    16.14    16.30
> > S112       16.67    17.45    16.54    17.52
> > S1112      13.16    13.87    14.83    14.84
> > S113       22.14    22.98    22.05    22.05
> > S1113      11.06    11.48    11.03    11.01
> > S114       13.21    13.81    13.53    13.48
> > S115       32.82    33.36    49.98    49.99
> > S1115      13.67    14.23    13.65    13.66
> > S116       47.37    49.43    49.54    48.11
> > S118       10.81    11.25    10.79    10.50
> > S119        8.73     9.09    11.83    11.82
> > S1119       8.82     9.15     4.31    11.87
> > S121       17.29    18.06    14.84    17.31
> > S122        7.53     7.70     6.11     6.11
> > S123        6.93     7.10     7.42     7.41
> > S124        9.63     9.84     9.42     9.33
> > S125        6.94     7.10     4.67     7.81
> > S126        2.34     2.55     2.57     2.37
> > S127       12.23    12.68     7.06    14.50
> > S128       11.78    12.41    12.42    11.52
> > S131       28.79    30.11    25.17    28.94
> > S132       17.04    17.04    15.53    21.03
> > S141       12.26    12.85    12.38    12.05
> > S151       28.79    30.11    24.89    28.95
> > S152       15.53    16.03    11.19    15.63
> > S161        6.00     6.12     5.52     5.46
> > S1161      14.40    14.50     8.80     8.79
> > S162        8.19     8.41     5.36     8.18
> > S171       15.41     7.96     2.81     5.70
> > S172        5.70     5.97     2.75     5.70
> > S173       30.32    31.69    18.15    30.13
> > S174       30.20    31.53    18.51    30.16
> > S175        5.79     6.04     4.94     5.77
> > S176        5.59     5.83     4.41     7.65
> > S211       16.31    16.89    16.82    16.38
> > S212       13.23    13.50    13.34    13.18
> > S1213      12.82    13.35    12.80    12.43
> > S221       10.87    11.09     8.65     8.63
> > S1221       5.72     6.03     5.40     6.05
> > S222        6.01     6.29     5.70     5.72
> > S231       22.38    24.22    22.36    22.11
> > S232        6.89     6.94     6.89     6.89
> > S1232      15.31    16.43    15.05    15.10
> > S233       55.47    59.98    54.21    49.56
> > S2233      27.23    29.71    29.68    28.40
> > S235       44.08    47.85    46.94    43.93
> > S241       31.14    31.72    32.53    31.01
> > S242        7.20     7.21     7.20     7.20
> > S243       16.54    16.99    17.69    16.84
> > S244       14.51    14.93    16.91    16.82
> > S1244      14.72    15.02    14.77    14.40
> > S2244      10.09    10.60    10.40    10.06
> > S251       34.42    35.55    19.70    34.38
> > S1251      55.39    57.11    41.77    56.11
> > S2251      15.69    16.26    17.02    15.70
> > S3251      15.69    16.52    19.60    15.34
> > S252        6.18     6.46     7.72     7.26
> > S253       11.19    11.52    14.40    14.40
> > S254       18.00    18.98    28.23    28.06
> > S255        5.94     6.14     9.96     9.95
> > S256        3.09     3.39     3.10     3.09
> > S257        2.13     2.31     2.21     2.20
> > S258        1.80     1.87     1.84     1.84
> > S261       12.00    12.22    10.98    10.95
> > S271       32.81    33.76    33.25    33.01
> > S272       15.04    15.52    15.39    15.26
> > S273       13.93    14.10    16.86    16.80
> > S274       17.83    18.53    18.15    17.89
> > S275        2.92     3.14     3.36     2.98
> > S2275      32.81    34.95     8.97    33.60
> > S276       41.26    41.97    40.80    40.55
> > S277        4.80     4.93     4.81     4.80
> > S278       14.43    14.76    14.70    14.66
> > S279        8.05     8.24     7.25     7.27
> > S1279       9.72     9.92     9.34     9.25
> > S2710       7.73     8.07     7.86     7.56
> > S2711      36.49    37.10    36.56    36.00
> > S2712      32.96    33.96    34.24    33.47
> > S281       10.80    11.32    12.46    12.02
> > S1281      79.10    78.11    57.78    68.06
> > S291       11.79    12.27    14.03    14.03
> > S292        6.70     6.91     9.94     9.96
> > S293       15.50    16.24    19.32    19.33
> > S2101       2.56     2.67     2.59     2.60
> > S2102      16.74    18.45    16.68    16.75
> > S2111       5.59     5.63     5.85     5.85
> > S311       72.04    72.27    72.23    72.03
> > S31111      7.50     6.01     6.00     6.00
> > S312       96.04    96.17    96.05    96.03
> > S313       36.02    36.61    36.03    36.02
> > S314       36.01    36.12    74.67    72.42
> > S315        9.11     9.21     9.35     9.30
> > S316       36.01    36.12    72.08    74.87
> > S317      444.91   444.94   451.82   451.78
> > S318        9.07     9.12     7.30     7.30
> > S319       34.57    36.46    34.42    34.19
> > S3110       8.52     8.61     4.11     4.11
> > S13110      5.75     5.78    12.12    12.12
> > S3111       3.60     3.64     3.60     3.60
> > S3112       7.20     7.30     7.21     7.20
> > S3113      33.68    34.18    60.21    60.20
> > S321       16.80    16.87    16.80    16.80
> > S322       12.42    12.64    12.60    12.60
> > S323       10.88    11.24     8.48     8.51
> > S331        4.23     4.36     7.20     7.20
> > S332        7.20     7.28     5.21     5.31
> > S341        4.80     5.04     7.23     7.20
> > S342        6.01     6.24     7.25     7.20
> > S343        2.04     2.16     2.16     2.01
> > S351       46.63    48.65    21.82    46.46
> > S1351      49.37    51.28    33.68    49.06
> > S352       57.64    58.44    57.68    57.64
> > S353        8.21     8.44     8.34     8.19
> > S421       24.26    25.29    20.62    22.46
> > S1421      25.18    26.16    15.85    24.76
> > S422       80.08    81.51    79.22    78.99
> > S423      155.02   155.21   154.56   154.38
> > S424       22.62    23.35    11.42    22.36
> > S431       57.22    59.82    27.59    57.16
> > S441       13.27    14.23    12.88    12.81
> > S442        5.99     6.13     6.96     6.90
> > S443       17.37    17.77    17.15    16.95
> > S451       48.92    48.99    49.03    49.14
> > S452       42.97    39.57    14.64    96.03
> > S453       28.06    28.07    14.60    14.40
> > S471        8.27     8.56     8.39     8.43
> > S481       10.93    11.23    12.04    12.00
> > S482        9.21     9.42     9.19     9.17
> > S491       11.31    11.60    11.37    11.28
> > S4112       8.21     8.45     9.13     8.94
> > S4113       8.65     8.95     8.86     8.85
> > S4114      11.87    12.35    12.18    11.77
> > S4115       8.28     8.51     8.95     8.59
> > S4116       3.23     3.22     6.02     5.94
> > S4117      13.97     9.69    10.16     9.98
> > S4121       8.20     8.44     4.04     8.17
> > va         28.50    29.33    23.58    48.46
> > vag        12.37    12.93    13.58    13.20
> > vas        13.46    14.15    13.03    12.47
> > vif         4.55     4.79     5.06     4.92
> > vpv        57.21    59.83    28.28    57.24
> > vtv        57.92    60.42    28.40    57.63
> > vpvtv      32.84    33.77    16.35    32.73
> > vpvts       5.82     6.07     2.99     6.38
> > vpvpv      32.87    33.84    16.54    32.85
> > vtvtv      32.82    33.75    16.84    35.97
> > vsumr      72.03    72.28    72.20    72.04
> > vdotr      72.05    73.22    72.42    72.04
> > vbor      205.24   381.18    99.80   372.05
> > 
> > I apologize for the multiple e-mails with a long list of numbers, but I
> > think that this was worth it (and I did not want to be unfair to the
> > clang developers).
> > 
> >  -Hal
> > 
> > > 
> > > Here are improved results where the correct (and default)
> > > vector-register size was used.
> > > 
> > > Loop       llvm-v   llvm     gcc-v    gcc
> > > -------------------------------------------
> > > S000        9.09     9.49     4.55    10.04
> > > S111        7.28     7.37     7.68     7.83
> > > S1111      13.78    14.48    16.14    16.30
> > > S112       16.67    17.41    16.54    17.52
> > > S1112      13.12    14.21    14.83    14.84
> > > S113       22.12    22.88    22.05    22.05
> > > S1113      11.06    11.42    11.03    11.01
> > > S114       13.23    13.75    13.53    13.48
> > > S115       32.76    33.24    49.98    49.99
> > > S1115      13.68    14.18    13.65    13.66
> > > S116       47.42    49.40    49.54    48.11
> > > S118       10.84    11.26    10.79    10.50
> > > S119        8.74     9.07    11.83    11.82
> > > S1119       8.81     9.14     4.31    11.87
> > > S121       17.28    18.78    14.84    17.31
> > > S122        7.53     7.54     6.11     6.11
> > > S123        6.90     7.38     7.42     7.41
> > > S124        9.60     9.77     9.42     9.33
> > > S125        6.92     7.22     4.67     7.81
> > > S126        2.34     2.53     2.57     2.37
> > > S127       12.19    12.97     7.06    14.50
> > > S128       11.74    12.43    12.42    11.52
> > > S131       28.75    29.91    25.17    28.94
> > > S132       17.04    17.04    15.53    21.03
> > > S141       12.28    12.26    12.38    12.05
> > > S151       28.80    29.43    24.89    28.95
> > > S152       15.54    16.03    11.19    15.63
> > > S161        6.00     6.06     5.52     5.46
> > > S1161      14.39    14.40     8.80     8.79
> > > S162        8.19     9.05     5.36     8.18
> > > S171       15.41     7.94     2.81     5.70
> > > S172        5.71     5.89     2.75     5.70
> > > S173       30.31    30.92    18.15    30.13
> > > S174       30.18    31.66    18.51    30.16
> > > S175        5.78     6.18     4.94     5.77
> > > S176        5.59     5.83     4.41     7.65
> > > S211       16.27    17.14    16.82    16.38
> > > S212       13.21    14.28    13.34    13.18
> > > S1213      12.81    13.46    12.80    12.43
> > > S221       10.86    11.09     8.65     8.63
> > > S1221       5.72     6.04     5.40     6.05
> > > S222        6.02     6.26     5.70     5.72
> > > S231       22.33    22.94    22.36    22.11
> > > S232        6.88     6.88     6.89     6.89
> > > S1232      15.30    15.34    15.05    15.10
> > > S233       55.38    58.55    54.21    49.56
> > > S2233      27.08    29.77    29.68    28.40
> > > S235       44.00    44.92    46.94    43.93
> > > S241       31.09    31.35    32.53    31.01
> > > S242        7.19     7.20     7.20     7.20
> > > S243       16.52    17.09    17.69    16.84
> > > S244       14.45    14.83    16.91    16.82
> > > S1244      14.71    14.83    14.77    14.40
> > > S2244      10.04    10.62    10.40    10.06
> > > S251       34.15    35.75    19.70    34.38
> > > S1251      55.23    57.84    41.77    56.11
> > > S2251      15.73    15.87    17.02    15.70
> > > S3251      15.66    16.21    19.60    15.34
> > > S252        6.18     6.32     7.72     7.26
> > > S253       11.14    11.38    14.40    14.40
> > > S254       18.41    18.70    28.23    28.06
> > > S255        5.93     6.09     9.96     9.95
> > > S256        3.08     3.42     3.10     3.09
> > > S257        2.13     2.25     2.21     2.20
> > > S258        1.79     1.82     1.84     1.84
> > > S261       12.00    12.08    10.98    10.95
> > > S271       32.82    33.04    33.25    33.01
> > > S272       14.98    15.82    15.39    15.26
> > > S273       13.92    14.04    16.86    16.80
> > > S274       17.83    18.31    18.15    17.89
> > > S275        2.92     3.02     3.36     2.98
> > > S2275      32.80    33.50     8.97    33.60
> > > S276       39.43    39.44    40.80    40.55
> > > S277        4.80     4.80     4.81     4.80
> > > S278       14.41    14.42    14.70    14.66
> > > S279        8.03     8.29     7.25     7.27
> > > S1279       9.71    10.06     9.34     9.25
> > > S2710       7.71     8.04     7.86     7.56
> > > S2711      35.53    35.55    36.56    36.00
> > > S2712      32.94    33.17    34.24    33.47
> > > S281       10.79    11.09    12.46    12.02
> > > S1281      79.13    77.55    57.78    68.06
> > > S291       11.80    11.78    14.03    14.03
> > > S292        7.77     7.78     9.94     9.96
> > > S293       15.50    15.87    19.32    19.33
> > > S2101       2.56     2.58     2.59     2.60
> > > S2102      16.71    17.53    16.68    16.75
> > > S2111       5.60     5.60     5.85     5.85
> > > S311       72.03    72.03    72.23    72.03
> > > S31111      7.49     6.00     6.00     6.00
> > > S312       96.04    96.04    96.05    96.03
> > > S313       36.02    36.13    36.03    36.02
> > > S314       36.01    36.07    74.67    72.42
> > > S315        8.96     8.99     9.35     9.30
> > > S316       36.02    36.06    72.08    74.87
> > > S317      444.93   444.94   451.82   451.78
> > > S318        9.05     9.07     7.30     7.30
> > > S319       34.54    36.53    34.42    34.19
> > > S3110       8.51     8.57     4.11     4.11
> > > S13110      5.75     5.77    12.12    12.12
> > > S3111       3.60     3.62     3.60     3.60
> > > S3112       7.19     7.30     7.21     7.20
> > > S3113      35.13    35.47    60.21    60.20
> > > S321       16.79    16.81    16.80    16.80
> > > S322       12.42    12.60    12.60    12.60
> > > S323       10.86    11.02     8.48     8.51
> > > S331        4.23     4.23     7.20     7.20
> > > S332        7.20     7.21     5.21     5.31
> > > S341        4.79     4.85     7.23     7.20
> > > S342        6.01     6.09     7.25     7.20
> > > S343        2.04     2.06     2.16     2.01
> > > S351       46.61    47.34    21.82    46.46
> > > S1351      49.28    50.35    33.68    49.06
> > > S352       57.65    58.04    57.68    57.64
> > > S353        8.21     8.38     8.34     8.19
> > > S421       42.94    43.34    20.62    22.46
> > > S1421      25.15    25.81    15.85    24.76
> > > S422       87.39    87.53    79.22    78.99
> > > S423      155.01   155.29   154.56   154.38
> > > S424       36.51    37.51    11.42    22.36
> > > S431       57.10    60.66    27.59    57.16
> > > S441       14.04    13.29    12.88    12.81
> > > S442        6.00     6.00     6.96     6.90
> > > S443       17.28    17.77    17.15    16.95
> > > S451       48.92    49.08    49.03    49.14
> > > S452       42.98    39.32    14.64    96.03
> > > S453       28.05    28.06    14.60    14.40
> > > S471        8.24     8.65     8.39     8.43
> > > S481       10.88    11.15    12.04    12.00
> > > S482        9.21     9.31     9.19     9.17
> > > S491       11.26    11.38    11.37    11.28
> > > S4112       8.21     8.36     9.13     8.94
> > > S4113       8.65     8.81     8.86     8.85
> > > S4114      11.82    12.15    12.18    11.77
> > > S4115       8.28     8.46     8.95     8.59
> > > S4116       3.22     3.23     6.02     5.94
> > > S4117      13.95     9.61    10.16     9.98
> > > S4121       8.21     8.26     4.04     8.17
> > > va         28.46    28.58    23.58    48.46
> > > vag        12.35    12.36    13.58    13.20
> > > vas        13.45    13.49    13.03    12.47
> > > vif         4.55     4.57     5.06     4.92
> > > vpv        57.08    57.22    28.28    57.24
> > > vtv        57.81    57.83    28.40    57.63
> > > vpvtv      32.82    32.84    16.35    32.73
> > > vpvts       5.82     5.83     2.99     6.38
> > > vpvpv      32.87    32.89    16.54    32.85
> > > vtvtv      32.82    32.80    16.84    35.97
> > > vsumr      72.04    72.03    72.20    72.04
> > > vdotr      72.06    72.05    72.42    72.04
> > > vbor      205.24   380.81    99.80   372.05
> > > 
> > >  -Hal
> > > 
> > > > > 
> > > > > Loop       llvm-v   llvm     gcc-v    gcc
> > > > > -------------------------------------------
> > > > > S000        9.59     9.49     4.55    10.04
> > > > > S111        7.67     7.37     7.68     7.83
> > > > > S1111      13.98    14.48    16.14    16.30
> > > > > S112       17.43    17.41    16.54    17.52
> > > > > S1112      13.87    14.21    14.83    14.84
> > > > > S113       22.97    22.88    22.05    22.05
> > > > > S1113      11.46    11.42    11.03    11.01
> > > > > S114       13.47    13.75    13.53    13.48
> > > > > S115       33.06    33.24    49.98    49.99
> > > > > S1115      13.91    14.18    13.65    13.66
> > > > > S116       48.74    49.40    49.54    48.11
> > > > > S118       11.04    11.26    10.79    10.50
> > > > > S119        8.97     9.07    11.83    11.82
> > > > > S1119       9.04     9.14     4.31    11.87
> > > > > S121       18.06    18.78    14.84    17.31
> > > > > S122        7.58     7.54     6.11     6.11
> > > > > S123        7.02     7.38     7.42     7.41
> > > > > S124        9.62     9.77     9.42     9.33
> > > > > S125        7.14     7.22     4.67     7.81
> > > > > S126        2.32     2.53     2.57     2.37
> > > > > S127       12.87    12.97     7.06    14.50
> > > > > S128       12.58    12.43    12.42    11.52
> > > > > S131       29.91    29.91    25.17    28.94
> > > > > S132       17.04    17.04    15.53    21.03
> > > > > S141       12.59    12.26    12.38    12.05
> > > > > S151       28.92    29.43    24.89    28.95
> > > > > S152       15.68    16.03    11.19    15.63
> > > > > S161        6.06     6.06     5.52     5.46
> > > > > S1161      14.46    14.40     8.80     8.79
> > > > > S162        8.31     9.05     5.36     8.18
> > > > > S171       15.47     7.94     2.81     5.70
> > > > > S172        5.92     5.89     2.75     5.70
> > > > > S173       31.59    30.92    18.15    30.13
> > > > > S174       31.16    31.66    18.51    30.16
> > > > > S175        5.80     6.18     4.94     5.77
> > > > > S176        5.69     5.83     4.41     7.65
> > > > > S211       16.56    17.14    16.82    16.38
> > > > > S212       13.46    14.28    13.34    13.18
> > > > > S1213      13.12    13.46    12.80    12.43
> > > > > S221       10.88    11.09     8.65     8.63
> > > > > S1221       5.80     6.04     5.40     6.05
> > > > > S222        6.01     6.26     5.70     5.72
> > > > > S231       23.78    22.94    22.36    22.11
> > > > > S232        6.88     6.88     6.89     6.89
> > > > > S1232      16.00    15.34    15.05    15.10
> > > > > S233       57.48    58.55    54.21    49.56
> > > > > S2233      27.65    29.77    29.68    28.40
> > > > > S235       46.40    44.92    46.94    43.93
> > > > > S241       31.62    31.35    32.53    31.01
> > > > > S242        7.20     7.20     7.20     7.20
> > > > > S243       16.78    17.09    17.69    16.84
> > > > > S244       14.64    14.83    16.91    16.82
> > > > > S1244      14.98    14.83    14.77    14.40
> > > > > S2244      10.47    10.62    10.40    10.06
> > > > > S251       35.10    35.75    19.70    34.38
> > > > > S1251      56.65    57.84    41.77    56.11
> > > > > S2251      15.96    15.87    17.02    15.70
> > > > > S3251      16.41    16.21    19.60    15.34
> > > > > S252        7.24     6.32     7.72     7.26
> > > > > S253       12.55    11.38    14.40    14.40
> > > > > S254       19.08    18.70    28.23    28.06
> > > > > S255        5.94     6.09     9.96     9.95
> > > > > S256        3.14     3.42     3.10     3.09
> > > > > S257        2.18     2.25     2.21     2.20
> > > > > S258        1.80     1.82     1.84     1.84
> > > > > S261       12.00    12.08    10.98    10.95
> > > > > S271       32.93    33.04    33.25    33.01
> > > > > S272       15.48    15.82    15.39    15.26
> > > > > S273       13.99    14.04    16.86    16.80
> > > > > S274       18.38    18.31    18.15    17.89
> > > > > S275        3.02     3.02     3.36     2.98
> > > > > S2275      33.71    33.50     8.97    33.60
> > > > > S276       39.52    39.44    40.80    40.55
> > > > > S277        4.81     4.80     4.81     4.80
> > > > > S278       14.43    14.42    14.70    14.66
> > > > > S279        8.10     8.29     7.25     7.27
> > > > > S1279       9.77    10.06     9.34     9.25
> > > > > S2710       7.85     8.04     7.86     7.56
> > > > > S2711      35.54    35.55    36.56    36.00
> > > > > S2712      33.16    33.17    34.24    33.47
> > > > > S281       10.97    11.09    12.46    12.02
> > > > > S1281      79.37    77.55    57.78    68.06
> > > > > S291       11.94    11.78    14.03    14.03
> > > > > S292        7.88     7.78     9.94     9.96
> > > > > S293       15.90    15.87    19.32    19.33
> > > > > S2101       2.59     2.58     2.59     2.60
> > > > > S2102      17.63    17.53    16.68    16.75
> > > > > S2111       5.63     5.60     5.85     5.85
> > > > > S311       72.07    72.03    72.23    72.03
> > > > > S31111      7.49     6.00     6.00     6.00
> > > > > S312       96.06    96.04    96.05    96.03
> > > > > S313       36.50    36.13    36.03    36.02
> > > > > S314       36.10    36.07    74.67    72.42
> > > > > S315        9.00     8.99     9.35     9.30
> > > > > S316       36.11    36.06    72.08    74.87
> > > > > S317      444.92   444.94   451.82   451.78
> > > > > S318        9.04     9.07     7.30     7.30
> > > > > S319       34.76    36.53    34.42    34.19
> > > > > S3110       8.53     8.57     4.11     4.11
> > > > > S13110      5.76     5.77    12.12    12.12
> > > > > S3111       3.60     3.62     3.60     3.60
> > > > > S3112       7.20     7.30     7.21     7.20
> > > > > S3113      35.12    35.47    60.21    60.20
> > > > > S321       16.81    16.81    16.80    16.80
> > > > > S322       12.42    12.60    12.60    12.60
> > > > > S323       10.93    11.02     8.48     8.51
> > > > > S331        4.23     4.23     7.20     7.20
> > > > > S332        7.21     7.21     5.21     5.31
> > > > > S341        4.74     4.85     7.23     7.20
> > > > > S342        6.02     6.09     7.25     7.20
> > > > > S343        2.14     2.06     2.16     2.01
> > > > > S351       49.26    47.34    21.82    46.46
> > > > > S1351      50.85    50.35    33.68    49.06
> > > > > S352       58.14    58.04    57.68    57.64
> > > > > S353        8.35     8.38     8.34     8.19
> > > > > S421       43.13    43.34    20.62    22.46
> > > > > S1421      25.25    25.81    15.85    24.76
> > > > > S422       88.36    87.53    79.22    78.99
> > > > > S423      155.13   155.29   154.56   154.38
> > > > > S424       37.11    37.51    11.42    22.36
> > > > > S431       58.22    60.66    27.59    57.16
> > > > > S441       14.05    13.29    12.88    12.81
> > > > > S442        6.08     6.00     6.96     6.90
> > > > > S443       17.60    17.77    17.15    16.95
> > > > > S451       48.95    49.08    49.03    49.14
> > > > > S452       42.98    39.32    14.64    96.03
> > > > > S453       28.06    28.06    14.60    14.40
> > > > > S471        8.53     8.65     8.39     8.43
> > > > > S481       10.98    11.15    12.04    12.00
> > > > > S482        9.31     9.31     9.19     9.17
> > > > > S491       11.54    11.38    11.37    11.28
> > > > > S4112       8.21     8.36     9.13     8.94
> > > > > S4113       8.77     8.81     8.86     8.85
> > > > > S4114      12.32    12.15    12.18    11.77
> > > > > S4115       8.48     8.46     8.95     8.59
> > > > > S4116       3.21     3.23     6.02     5.94
> > > > > S4117      14.08     9.61    10.16     9.98
> > > > > S4121       8.53     8.26     4.04     8.17
> > > > > va         30.09    28.58    23.58    48.46
> > > > > vag        12.35    12.36    13.58    13.20
> > > > > vas        13.74    13.49    13.03    12.47
> > > > > vif         4.49     4.57     5.06     4.92
> > > > > vpv        58.59    57.22    28.28    57.24
> > > > > vtv        59.15    57.83    28.40    57.63
> > > > > vpvtv      33.18    32.84    16.35    32.73
> > > > > vpvts       5.99     5.83     2.99     6.38
> > > > > vpvpv      33.25    32.89    16.54    32.85
> > > > > vtvtv      32.83    32.80    16.84    35.97
> > > > > vsumr      72.03    72.03    72.20    72.04
> > > > > vdotr      72.05    72.05    72.42    72.04
> > > > > vbor      205.22   380.81    99.80   372.05
> > > > > 
> > > > > I've yet to go through these in detail (they just finished running 5
> > > > > minutes ago). But for the curious (and I've had several requests for
> > > > > benchmarks), here you go. There is obviously more work to do.
> > > > > 
> > > > >  -Hal
> > > > > 
> > > > > On Fri, 2011-10-28 at 14:30 +0200, Ralf Karrenberg wrote:
> > > > > > Hi Hal,
> > > > > > 
> > > > > > those numbers look very promising, great work! :)
> > > > > > 
> > > > > > Best,
> > > > > > Ralf
> > > > > > 
> > > > > > ----- Original Message -----
> > > > > > > From: "Hal Finkel" <hfinkel at anl.gov>
> > > > > > > To: "Bruno Cardoso Lopes" <bruno.cardoso at gmail.com>
> > > > > > > Cc: llvm-commits at cs.uiuc.edu
> > > > > > > Sent: Freitag, 28. Oktober 2011 13:50:00
> > > > > > > Subject: Re: [llvm-commits] [PATCH] BasicBlock Autovectorization Pass
> > > > > > > 
> > > > > > > Bruno, et al.,
> > > > > > > 
> > > > > > > I've attached a new version of the patch that contains improvements
> > > > > > > (and
> > > > > > > a critical bug fix [the code output is not more right, but the pass
> > > > > > > in
> > > > > > > the older patch would crash in certain cases and now does not])
> > > > > > > compared
> > > > > > > to previous versions that I've posted.
> > > > > > > 
> > > > > > > First, these are preliminary results because I did not do the things
> > > > > > > necessary to make them real (explicitly quiet the machine, bind the
> > > > > > > processes to one cpu, etc.). But they should be good enough for
> > > > > > > discussion.
> > > > > > > 
> > > > > > > I'm using LLVM head r143101, with the attached patch applied, and
> > > > > > > clang
> > > > > > > head r143100 on an x86_64 machine (some kind of Intel Xeon). For the
> > > > > > > gcc
> > > > > > > comparison, I'm using build Ubuntu 4.4.3-4ubuntu5. gcc was run -O3
> > > > > > > without any other optimization flags. opt was run -vectorize
> > > > > > > -unroll-allow-partial -O3 with no other optimization flags (the patch
> > > > > > > adds the -vectorize option). llc was just given -O3.
> > > > > > > 
> > > > > > > It is not difficult to construct an example in which vectorization
> > > > > > > would
> > > > > > > be useful: take a loop that does more computation than load/stores,
> > > > > > > and
> > > > > > > (partially) unroll it. Here is a simple case:
> > > > > > > 
> > > > > > > #define ITER 5000
> > > > > > > #define NUM 200
> > > > > > > double a[NUM][NUM];
> > > > > > > double b[NUM][NUM];
> > > > > > > 
> > > > > > > ...
> > > > > > > 
> > > > > > > int main()
> > > > > > > {
> > > > > > > ...
> > > > > > > 
> > > > > > >   for (int i = 0; i < ITER; ++i) {
> > > > > > >     for (int x = 0; x < NUM; ++x)
> > > > > > >     for (int y = 0; y < NUM; ++y) {
> > > > > > >       double v = a[x][y], w = b[x][y];
> > > > > > >       double z1 = v*w;
> > > > > > >       double z2 = v+w;
> > > > > > >       double z3 = z1*z2;
> > > > > > >       double z4 = z3+v;
> > > > > > >       double z5 = z2+w;
> > > > > > >       double z6 = z4*z5;
> > > > > > >       double z7 = z4+z5;
> > > > > > >       a[x][y] = v*v-z6;
> > > > > > >       b[x][y] = w-z7;
> > > > > > >     }
> > > > > > >   }
> > > > > > > 
> > > > > > >  ...
> > > > > > > 
> > > > > > >   return 0;
> > > > > > > }
> > > > > > > 
> > > > > > > Results:
> > > > > > > gcc -03: 0m1.790s
> > > > > > > llvm -vectorize: 0m2.360s
> > > > > > > llvm: 0m2.780s
> > > > > > > gcc -fno-tree-vectorize: 0m2.810s
> > > > > > > (these are the user times after I've run enough for the times to
> > > > > > > settle
> > > > > > > to three decimal places)
> > > > > > > 
> > > > > > > So the vectorization gives a ~15% improvement in the running time.
> > > > > > > gcc's
> > > > > > > vectorization still does a much better job, however (yielding an ~36%
> > > > > > > improvement). So there is still work to do ;)
> > > > > > > 
> > > > > > > Additionally, I've checked the autovectorization on some classic
> > > > > > > numerical benchmarks from netlib. On these benchmarks, clang/llvm
> > > > > > > already do a good job compared to gcc (gcc is only about 10% better,
> > > > > > > and
> > > > > > > this is true regardless of whether gcc's vectorization is on or off).
> > > > > > > For these cases, autovectorization provides an insignificant speedup
> > > > > > > in
> > > > > > > most cases (but does not tend to make things worse, just not really
> > > > > > > any
> > > > > > > better either). Because gcc's vectorization also did not really help
> > > > > > > gcc
> > > > > > > in these cases, I'm not surprised. A good collection of these is
> > > > > > > available here:
> > > > > > > http://www.roylongbottom.org.uk/classic_benchmarks.tar.gz
> > > > > > > 
> > > > > > > I've yet to run the test suite using the pass to validate it. That is
> > > > > > > something that I plan to do. Actually, the "Livermore Loops" test in
> > > > > > > the
> > > > > > > aforementioned archive contains checksums to validate the results,
> > > > > > > and
> > > > > > > it looks like 1 or 2 of the loop results are wrong with vectorization
> > > > > > > turned on, so I'll have to investigate that.
> > > > > > > 
> > > > > > >  -Hal
> > > > > > > 
> > > > > > > On Wed, 2011-10-26 at 18:49 -0200, Bruno Cardoso Lopes wrote:
> > > > > > > > Hi Hal,
> > > > > > > > 
> > > > > > > > On Fri, Oct 21, 2011 at 7:04 PM, Hal Finkel <hfinkel at anl.gov>
> > > > > > > > wrote:
> > > > > > > > > I've attached an initial version of a basic-block
> > > > > > > > > autovectorization
> > > > > > > > > pass. It works by searching a basic block for pairable
> > > > > > > > > (independent)
> > > > > > > > > instructions, and, using a chain-seeking heuristic, selects
> > > > > > > > > pairings
> > > > > > > > > likely to provide an overall speedup (if such pairings can be
> > > > > > > > > found).
> > > > > > > > > The selected pairs are then fused and, if necessary, other
> > > > > > > > > instructions
> > > > > > > > > are moved in order to maintain data-flow consistency. This works
> > > > > > > > > only
> > > > > > > > > within one basic block, but can do loop vectorization in
> > > > > > > > > combination
> > > > > > > > > with (partial) unrolling. The basic idea was inspired by the
> > > > > > > > > Vienna MAP
> > > > > > > > > Vectorizor, which has been used to vectorize FFT kernels, but the
> > > > > > > > > algorithm used here is different.
> > > > > > > > >
> > > > > > > > > To try it, use -bb-vectorize with opt. There are a few options:
> > > > > > > > > -bb-vectorize-req-chain-depth: default: 3 -- The depth of the
> > > > > > > > > chain of
> > > > > > > > > instruction pairs necessary in order to consider the pairs that
> > > > > > > > > compose
> > > > > > > > > the chain worthy of vectorization.
> > > > > > > > > -bb-vectorize-vector-bits: default: 128 -- The size of the target
> > > > > > > > > vector
> > > > > > > > > registers
> > > > > > > > > -bb-vectorize-no-ints -- Don't consider integer instructions
> > > > > > > > > -bb-vectorize-no-floats -- Don't consider floating-point
> > > > > > > > > instructions
> > > > > > > > >
> > > > > > > > > The vectorizor generates a lot of insert_element/extract_element
> > > > > > > > > pairs;
> > > > > > > > > The assumption is that other passes will turn these into shuffles
> > > > > > > > > when
> > > > > > > > > possible (it looks like some work is necessary here). It will
> > > > > > > > > also
> > > > > > > > > vectorize vector instructions, and generates shuffles in this
> > > > > > > > > case
> > > > > > > > > (again, other passes should combine these as appropriate).
> > > > > > > > >
> > > > > > > > > Currently, it does not fuse load or store instructions, but that
> > > > > > > > > is a
> > > > > > > > > feature that I'd like to add. Of course, alignment information is
> > > > > > > > > an
> > > > > > > > > issue for load/store vectorization (or maybe I should just fuse
> > > > > > > > > them
> > > > > > > > > anyway and let isel deal with unaligned cases?).
> > > > > > > > >
> > > > > > > > > Also, support needs to be added for fusing known intrinsics (fma,
> > > > > > > > > etc.),
> > > > > > > > > and, as has been discussed on llvmdev, we should add some
> > > > > > > > > intrinsics to
> > > > > > > > > allow the generation of addsub-type instructions.
> > > > > > > > >
> > > > > > > > > I've included a few tests, but it needs more. Please review (I'll
> > > > > > > > > commit
> > > > > > > > > if and when everyone is happy).
> > > > > > > > >
> > > > > > > > > Thanks in advance,
> > > > > > > > > Hal
> > > > > > > > >
> > > > > > > > > P.S. There is another option (not so useful right now, but could
> > > > > > > > > be):
> > > > > > > > > -bb-vectorize-fast-dep -- Don't do a full inter-instruction
> > > > > > > > > dependency
> > > > > > > > > analysis; instead stop looking for instruction pairs after the
> > > > > > > > > first use
> > > > > > > > > of an instruction's value. [This makes the pass faster, but would
> > > > > > > > > require a data-dependence-based reordering pass in order to be
> > > > > > > > > effective].
> > > > > > > > 
> > > > > > > > Cool! :)
> > > > > > > > Have you run this pass with any benchmark or the llvm testsuite?
> > > > > > > > Does
> > > > > > > > it presents any regression?
> > > > > > > > Do you have any performance results?
> > > > > > > > Cheers,
> > > > > > > > 
> > > > > > > 
> > > > > > > --
> > > > > > > Hal Finkel
> > > > > > > Postdoctoral Appointee
> > > > > > > Leadership Computing Facility
> > > > > > > Argonne National Laboratory
> > > > > > > 
> > > > > > > _______________________________________________
> > > > > > > llvm-commits mailing list
> > > > > > > llvm-commits at cs.uiuc.edu
> > > > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > > > > > 
> > > > > 
> > > > > _______________________________________________
> > > > > llvm-commits mailing list
> > > > > llvm-commits at cs.uiuc.edu
> > > > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > > > 
> > > 
> > 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-- 
Hal Finkel
Postdoctoral Appointee
Leadership Computing Facility
Argonne National Laboratory




More information about the llvm-commits mailing list