[PATCH] D19678: Annotated-source optimization reports (a.k.a. "listing" files)
Hal Finkel via cfe-commits
cfe-commits at lists.llvm.org
Thu Apr 28 15:26:17 PDT 2016
hfinkel added a comment.
In http://reviews.llvm.org/D19678#416127, @rcox2 wrote:
> Actually, the Intel compiler distinguishes between an optimization report (-qopt-report) and an annotated listing (-qopt-report-annotate).
Interesting; thanks for pointing this out (and for the example).
> The optimization report lists the info for optimizations in a hierarchical fashion. To use you example,
> icc -c -O3 -qopt-report=1 -qopt-report-file=stderr v.c
>
> yields:
>
> Report from: Interprocedural optimizations [ipo]
>
>
> INLINING OPTION VALUES:
>
> -inline-factor: 100
> -inline-min-size: 20
> -inline-max-size: 230
> -inline-max-total-size: 2000
> -inline-max-per-routine: 10000
> -inline-max-per-compile: 500000
>
>
>
> Begin optimization report for: foo()
>
> Report from: Interprocedural optimizations [ipo]
>
>
> INLINE REPORT: (foo()) [1] v.c(2,12)
>
> Report from: Code generation optimizations [cg]
>
>
> v.c(2,12):remark #34051: REGISTER ALLOCATION : [foo] v.c:2
>
> Hardware registers
> Reserved : 1[ esp]
> Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]
> Callee-save : 4[ ebx ebp esi edi]
> Assigned : 0[ reg_null]
>
> Routine temporaries
> Total : 4
> Global : 0
> Local : 4
> Regenerable : 0
> Spilled : 0
>
> Routine stack
> Variables : 0 bytes*
> Reads : 0 [0.00e+00 ~ 0.0%]
> Writes : 0 [0.00e+00 ~ 0.0%]
> Spills : 0 bytes*
> Reads : 0 [0.00e+00 ~ 0.0%]
> Writes : 0 [0.00e+00 ~ 0.0%]
>
> Notes
>
> *Non-overlapping variables and spills may share stack space,
> so the total stack size might be less than this.
>
>
>
> ===========================================================================
>
> Begin optimization report for: Test(int *, int *, int *, int *, int)
>
> Report from: Interprocedural optimizations [ipo]
>
>
> INLINE REPORT: (Test(int *, int *, int *, int *, int)) [2] v.c(4,52)
>
> -> INLINE: (16,3) foo()
> -> INLINE: (18,3) foo()
> -> INLINE: (18,17) foo()
>
>
> Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]
>
>
>
> LOOP BEGIN at v.c(8,8)
> <Peeled loop for vectorization>
> LOOP END
>
> LOOP BEGIN at v.c(8,8)
>
> remark #15301: SIMD LOOP WAS VECTORIZED
>
> LOOP END
>
> LOOP BEGIN at v.c(8,8)
> <Alternate Alignment Vectorized Loop>
> LOOP END
>
> LOOP BEGIN at v.c(8,8)
> <Remainder loop for vectorization>
>
> remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
>
> LOOP END
>
> LOOP BEGIN at v.c(12,3)
>
> remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
> remark #15346: vector dependence: assumed FLOW dependence between res[i] (13:5) and d[i] (13:5)
> remark #25436: completely unrolled by 16
>
> LOOP END
>
> Report from: Code generation optimizations [cg]
>
>
> v.c(4,52):remark #34051: REGISTER ALLOCATION : [Test] v.c:4
>
> Hardware registers
> Reserved : 1[ esp]
> Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]
> Callee-save : 4[ ebx ebp esi edi]
> Assigned : 15[ eax edx ecx ebx ebp esi edi zmm0-zmm7]
>
> Routine temporaries
> Total : 123
> Global : 47
> Local : 76
> Regenerable : 5
> Spilled : 6
>
> Routine stack
> Variables : 0 bytes*
> Reads : 0 [0.00e+00 ~ 0.0%]
> Writes : 0 [0.00e+00 ~ 0.0%]
> Spills : 8 bytes*
> Reads : 5 [1.41e+01 ~ 1.4%]
> Writes : 3 [3.00e+00 ~ 0.3%]
>
> Notes
>
> *Non-overlapping variables and spills may share stack space,
> so the total stack size might be less than this.
>
>
> while the annotated listing looks like:
>
> //
> // ------- Annotated listing with optimization reports for "/export/iusers/rcox2/rgHF/v.c" -------
> //
> //INLINING OPTION VALUES:
> // -inline-factor: 100
> // -inline-min-size: 20
> // -inline-max-size: 230
> // -inline-max-total-size: 2000
> // -inline-max-per-routine: 10000
> // -inline-max-per-compile: 500000
> //
> 1 void bar();
> 2 void foo() { bar(); }
> //INLINE REPORT: (foo()) [1] /export/iusers/rcox2/rgHF/v.c(2,12)
> //
> ///export/iusers/rcox2/rgHF/v.c(2,12):remark #34051: REGISTER ALLOCATION : [foo] /export/iusers/rcox2/rgHF/v.c:2
> //
> // Hardware registers
> // Reserved : 1[ esp]
> // Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]
> // Callee-save : 4[ ebx ebp esi edi]
> // Assigned : 0[ reg_null]
> //
> // Routine temporaries
> // Total : 4
> // Global : 0
> // Local : 4
> // Regenerable : 0
> // Spilled : 0
> //
> // Routine stack
> // Variables : 0 bytes*
> // Reads : 0 [0.00e+00 ~ 0.0%]
> // Writes : 0 [0.00e+00 ~ 0.0%]
> // Spills : 0 bytes*
> // Reads : 0 [0.00e+00 ~ 0.0%]
> // Writes : 0 [0.00e+00 ~ 0.0%]
> //
> // Notes
> //
> // *Non-overlapping variables and spills may share stack space,
> // so the total stack size might be less than this.
> //
> //
> 3
> 4 void Test(int *res, int *c, int *d, int *p, int n) {
> //INLINE REPORT: (Test(int *, int *, int *, int *, int)) [2] /export/iusers/rcox2/rgHF/v.c(4,52)
> // -> INLINE: (16,3) foo()
> // -> INLINE: (18,3) foo()
> // -> INLINE: (18,17) foo()
> //
> ///export/iusers/rcox2/rgHF/v.c(4,52):remark #34051: REGISTER ALLOCATION : [Test] /export/iusers/rcox2/rgHF/v.c:4
> //
> // Hardware registers
> // Reserved : 1[ esp]
> // Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]
> // Callee-save : 4[ ebx ebp esi edi]
> // Assigned : 15[ eax edx ecx ebx ebp esi edi zmm0-zmm7]
> //
> // Routine temporaries
> // Total : 123
> // Global : 47
> // Local : 76
> // Regenerable : 5
> // Spilled : 6
> //
> // Routine stack
> // Variables : 0 bytes*
> // Reads : 0 [0.00e+00 ~ 0.0%]
> // Writes : 0 [0.00e+00 ~ 0.0%]
> // Spills : 8 bytes*
> // Reads : 5 [1.41e+01 ~ 1.4%]
> // Writes : 3 [3.00e+00 ~ 0.3%]
> //
> // Notes
> //
> // *Non-overlapping variables and spills may share stack space,
> // so the total stack size might be less than this.
> //
> //
> 5 int i;
> 6
> 7 #pragma simd
> 8 for (i = 0; i < 1600; i++) {
> //
> //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)
> //<Peeled loop for vectorization>
> //LOOP END
> //
> //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)
> // remark #15301: SIMD LOOP WAS VECTORIZED
> //LOOP END
> //
> //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)
> //<Alternate Alignment Vectorized Loop>
> //LOOP END
> //
> //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)
> //<Remainder loop for vectorization>
> // remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
> //LOOP END
> 9 res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
> 10 }
> 11
> 12 for (i = 0; i < 16; i++) {
> //
> //LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(12,3)
> // remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
> // remark #15346: vector dependence: assumed FLOW dependence between res[i] (13:5) and d[i] (13:5)
> // remark #25436: completely unrolled by 16
> //LOOP END
> 13 res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
> 14 }
> 15
> 16 foo();
> 17
> 18 foo(); bar(); foo();
> 19 }
>
> essentially, various parts of the optimization report are inserted into a listing at the appropriate line numbers.
>
> (Note that this is just the default level. More detail can be obtained with -qopt-report=X where X>1 (up to 5 is supported)).
>
> I believe what Hal is proposing in this patch is a very useful light-weight annotation of the source with key information. But I also believe that there is value for a stand-alone opt report with the kind of detailed information I presented in http://reviews.llvm.org/D19397 and the two follow up patches.
To be clear, I agree. I'd like to have both.
> In general, while this info can be interspersed in the source listing, I believe that for most purposes it is a bit too "busy" in text form. (The Intel compiler also supports annotated html and functionality that feeds into Visual Studio that has received great reviews.)
I think this piggybacks on Richard's suggestion regarding later integration with the static analyzer's output capabilities. We should definitely explore how this might be done.
http://reviews.llvm.org/D19678
More information about the cfe-commits
mailing list