[PATCH] D19678: Annotated-source optimization reports (a.k.a. "listing" files)

Thu Apr 28 15:26:17 PDT 2016

hfinkel added a comment.

In http://reviews.llvm.org/D19678#416127, @rcox2 wrote:

> Actually, the Intel compiler distinguishes between an optimization report (-qopt-report) and an annotated listing (-qopt-report-annotate).

Interesting; thanks for pointing this out (and for the example).

>   The optimization report lists the info for optimizations in a hierarchical fashion.  To use you example, 

>     icc -c -O3 -qopt-report=1 -qopt-report-file=stderr v.c 

> 

> yields:

> 

>   Report from: Interprocedural optimizations [ipo]

>    

> 

> INLINING OPTION VALUES:

> 

>   -inline-factor: 100

>   -inline-min-size: 20

>   -inline-max-size: 230

>   -inline-max-total-size: 2000

>   -inline-max-per-routine: 10000

>   -inline-max-per-compile: 500000

>    

>    

> 

> Begin optimization report for: foo()

> 

>   Report from: Interprocedural optimizations [ipo]

>    

> 

> INLINE REPORT: (foo()) [1] v.c(2,12)

> 

>   Report from: Code generation optimizations [cg]

>    

> 

> v.c(2,12):remark #34051: REGISTER ALLOCATION : [foo] v.c:2

> 

>   Hardware registers

>       Reserved     :    1[ esp]

>       Available    :   23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]

>       Callee-save  :    4[ ebx ebp esi edi]

>       Assigned     :    0[ reg_null]

>    

>   Routine temporaries

>       Total         :       4

>           Global    :       0

>           Local     :       4

>       Regenerable   :       0

>       Spilled       :       0

>    

>   Routine stack

>       Variables     :       0 bytes*

>           Reads     :       0 [0.00e+00 ~ 0.0%]

>           Writes    :       0 [0.00e+00 ~ 0.0%]

>       Spills        :       0 bytes*

>           Reads     :       0 [0.00e+00 ~ 0.0%]

>           Writes    :       0 [0.00e+00 ~ 0.0%]

>    

>   Notes

>    

>       *Non-overlapping variables and spills may share stack space,

>        so the total stack size might be less than this.

>    

>    

> 

> ===========================================================================

> 

> Begin optimization report for: Test(int *, int *, int *, int *, int)

> 

>   Report from: Interprocedural optimizations [ipo]

>    

> 

> INLINE REPORT: (Test(int *, int *, int *, int *, int)) [2] v.c(4,52)

> 

>   -> INLINE: (16,3) foo()

>   -> INLINE: (18,3) foo()

>   -> INLINE: (18,17) foo()

>    

>    

>     Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]

>    

>    

> 

> LOOP BEGIN at v.c(8,8)

>  <Peeled loop for vectorization>

>  LOOP END

> 

> LOOP BEGIN at v.c(8,8)

> 

>   remark #15301: SIMD LOOP WAS VECTORIZED

> 

> LOOP END

> 

> LOOP BEGIN at v.c(8,8)

>  <Alternate Alignment Vectorized Loop>

>  LOOP END

> 

> LOOP BEGIN at v.c(8,8)

>  <Remainder loop for vectorization>

> 

>   remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override

> 

> LOOP END

> 

> LOOP BEGIN at v.c(12,3)

> 

>   remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details

>   remark #15346: vector dependence: assumed FLOW dependence between res[i] (13:5) and d[i] (13:5)

>   remark #25436: completely unrolled by 16

> 

> LOOP END

> 

>   Report from: Code generation optimizations [cg]

>    

> 

> v.c(4,52):remark #34051: REGISTER ALLOCATION : [Test] v.c:4

> 

>   Hardware registers

>       Reserved     :    1[ esp]

>       Available    :   23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]

>       Callee-save  :    4[ ebx ebp esi edi]

>       Assigned     :   15[ eax edx ecx ebx ebp esi edi zmm0-zmm7]

>    

>   Routine temporaries

>       Total         :     123

>           Global    :      47

>           Local     :      76

>       Regenerable   :       5

>       Spilled       :       6

>    

>   Routine stack

>       Variables     :       0 bytes*

>           Reads     :       0 [0.00e+00 ~ 0.0%]

>           Writes    :       0 [0.00e+00 ~ 0.0%]

>       Spills        :       8 bytes*

>           Reads     :       5 [1.41e+01 ~ 1.4%]

>           Writes    :       3 [3.00e+00 ~ 0.3%]

>    

>   Notes

>    

>       *Non-overlapping variables and spills may share stack space,

>        so the total stack size might be less than this.

>    

> 

> while the annotated listing looks like:

> 

> //

>  // ------- Annotated listing with optimization reports for "/export/iusers/rcox2/rgHF/v.c" -------

>  //

>  //INLINING OPTION VALUES:

>  //  -inline-factor: 100

>  //  -inline-min-size: 20

>  //  -inline-max-size: 230

>  //  -inline-max-total-size: 2000

>  //  -inline-max-per-routine: 10000

>  //  -inline-max-per-compile: 500000

>  //

>  1       void bar();

>  2       void foo() { bar(); }

>  //INLINE REPORT: (foo()) [1] /export/iusers/rcox2/rgHF/v.c(2,12)

>  //

>  ///export/iusers/rcox2/rgHF/v.c(2,12):remark #34051: REGISTER ALLOCATION : [foo] /export/iusers/rcox2/rgHF/v.c:2

>  //

>  //    Hardware registers

>  //        Reserved     :    1[ esp]

>  //        Available    :   23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]

>  //        Callee-save  :    4[ ebx ebp esi edi]

>  //        Assigned     :    0[ reg_null]

>  //

>  //    Routine temporaries

>  //        Total         :       4

>  //            Global    :       0

>  //            Local     :       4

>  //        Regenerable   :       0

>  //        Spilled       :       0

>  //

>  //    Routine stack

>  //        Variables     :       0 bytes*

>  //            Reads     :       0 [0.00e+00 ~ 0.0%]

>  //            Writes    :       0 [0.00e+00 ~ 0.0%]

>  //        Spills        :       0 bytes*

>  //            Reads     :       0 [0.00e+00 ~ 0.0%]

>  //            Writes    :       0 [0.00e+00 ~ 0.0%]

>  //

>  //    Notes

>  //

>  //        *Non-overlapping variables and spills may share stack space,

>  //         so the total stack size might be less than this.

>  //

>  //

> 3

>  4       void Test(int *res, int *c, int *d, int *p, int n) {

>  //INLINE REPORT: (Test(int *, int *, int *, int *, int)) [2] /export/iusers/rcox2/rgHF/v.c(4,52)

>  //  -> INLINE: (16,3) foo()

>  //  -> INLINE: (18,3) foo()

>  //  -> INLINE: (18,17) foo()

>  //

>  ///export/iusers/rcox2/rgHF/v.c(4,52):remark #34051: REGISTER ALLOCATION : [Test] /export/iusers/rcox2/rgHF/v.c:4

>  //

>  //    Hardware registers

>  //        Reserved     :    1[ esp]

>  //        Available    :   23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]

>  //        Callee-save  :    4[ ebx ebp esi edi]

>  //        Assigned     :   15[ eax edx ecx ebx ebp esi edi zmm0-zmm7]

>  //

>  //    Routine temporaries

>  //        Total         :     123

>  //            Global    :      47

>  //            Local     :      76

>  //        Regenerable   :       5

>  //        Spilled       :       6

>  //

>  //    Routine stack

>  //        Variables     :       0 bytes*

>  //            Reads     :       0 [0.00e+00 ~ 0.0%]

>  //            Writes    :       0 [0.00e+00 ~ 0.0%]

>  //        Spills        :       8 bytes*

>  //            Reads     :       5 [1.41e+01 ~ 1.4%]

>  //            Writes    :       3 [3.00e+00 ~ 0.3%]

>  //

>  //    Notes

>  //

>  //        *Non-overlapping variables and spills may share stack space,

>  //         so the total stack size might be less than this.

>  //

>  //

>  5         int i;

> 6

>  7       #pragma simd

>  8         for (i = 0; i < 1600; i++) {

>  //

>  //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)

>  //<Peeled loop for vectorization>

>  //LOOP END

>  //

>  //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)

>  //   remark #15301: SIMD LOOP WAS VECTORIZED

>  //LOOP END

>  //

>  //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)

>  //<Alternate Alignment Vectorized Loop>

>  //LOOP END

>  //

>  //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)

>  //<Remainder loop for vectorization>

>  //   remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override

>  //LOOP END

>  9           res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];

>  10        }

> 11

>  12        for (i = 0; i < 16; i++) {

>  //

>  //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(12,3)

>  //   remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details

>  //   remark #15346: vector dependence: assumed FLOW dependence between res[i] (13:5) and d[i] (13:5)

>  //   remark #25436: completely unrolled by 16

>  //LOOP END

>  13          res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];

>  14        }

> 15

>  16        foo();

> 17

>  18        foo(); bar(); foo();

>  19      }

> 

> essentially, various parts of the optimization report are inserted into a listing at the appropriate line numbers.

> 

> (Note that this is just the default level.  More detail can be obtained with -qopt-report=X where X>1 (up to 5 is supported)).

> 

> I believe what Hal is proposing in this patch is a very useful light-weight annotation of the source with key information.  But I also believe that there is value for a stand-alone opt report with the kind of detailed information I presented in http://reviews.llvm.org/D19397 and the two follow up patches.

To be clear, I agree. I'd like to have both.

>   In general, while this info can be interspersed in the source listing, I believe that for most purposes it is a bit too "busy" in text form.  (The Intel compiler also supports annotated html and functionality that feeds into Visual Studio that has received great reviews.) 

I think this piggybacks on Richard's suggestion regarding later integration with the static analyzer's output capabilities. We should definitely explore how this might be done.

http://reviews.llvm.org/D19678