[PATCH] D19678: Annotated-source optimization reports (a.k.a. "listing" files)
Robert Cox via cfe-commits
cfe-commits at lists.llvm.org
Thu Apr 28 15:19:13 PDT 2016
rcox2 added a comment.
Actually, the Intel compiler distinguishes between an optimization report (-qopt-report) and an annotated listing (-qopt-report-annotate). The optimization report lists the info for optimizations in a hierarchical fashion. To use you example,
icc -c -O3 -qopt-report=1 -qopt-report-file=stderr v.c
yields:
Report from: Interprocedural optimizations [ipo]
INLINING OPTION VALUES:
-inline-factor: 100
-inline-min-size: 20
-inline-max-size: 230
-inline-max-total-size: 2000
-inline-max-per-routine: 10000
-inline-max-per-compile: 500000
Begin optimization report for: foo()
Report from: Interprocedural optimizations [ipo]
INLINE REPORT: (foo()) [1] v.c(2,12)
Report from: Code generation optimizations [cg]
v.c(2,12):remark #34051: REGISTER ALLOCATION : [foo] v.c:2
Hardware registers
Reserved : 1[ esp]
Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]
Callee-save : 4[ ebx ebp esi edi]
Assigned : 0[ reg_null]
Routine temporaries
Total : 4
Global : 0
Local : 4
Regenerable : 0
Spilled : 0
Routine stack
Variables : 0 bytes*
Reads : 0 [0.00e+00 ~ 0.0%]
Writes : 0 [0.00e+00 ~ 0.0%]
Spills : 0 bytes*
Reads : 0 [0.00e+00 ~ 0.0%]
Writes : 0 [0.00e+00 ~ 0.0%]
Notes
*Non-overlapping variables and spills may share stack space,
so the total stack size might be less than this.
Begin optimization report for: Test(int *, int *, int *, int *, int)
Report from: Interprocedural optimizations [ipo]
INLINE REPORT: (Test(int *, int *, int *, int *, int)) [2] v.c(4,52)
-> INLINE: (16,3) foo()
-> INLINE: (18,3) foo()
-> INLINE: (18,17) foo()
Report from: Loop nest, Vector & Auto-parallelization optimizations [loop, vec, par]
LOOP BEGIN at v.c(8,8)
<Peeled loop for vectorization>
LOOP END
LOOP BEGIN at v.c(8,8)
remark #15301: SIMD LOOP WAS VECTORIZED
LOOP END
LOOP BEGIN at v.c(8,8)
<Alternate Alignment Vectorized Loop>
LOOP END
LOOP BEGIN at v.c(8,8)
<Remainder loop for vectorization>
remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
LOOP END
LOOP BEGIN at v.c(12,3)
remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
remark #15346: vector dependence: assumed FLOW dependence between res[i] (13:5) and d[i] (13:5)
remark #25436: completely unrolled by 16
LOOP END
Report from: Code generation optimizations [cg]
v.c(4,52):remark #34051: REGISTER ALLOCATION : [Test] v.c:4
Hardware registers
Reserved : 1[ esp]
Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]
Callee-save : 4[ ebx ebp esi edi]
Assigned : 15[ eax edx ecx ebx ebp esi edi zmm0-zmm7]
Routine temporaries
Total : 123
Global : 47
Local : 76
Regenerable : 5
Spilled : 6
Routine stack
Variables : 0 bytes*
Reads : 0 [0.00e+00 ~ 0.0%]
Writes : 0 [0.00e+00 ~ 0.0%]
Spills : 8 bytes*
Reads : 5 [1.41e+01 ~ 1.4%]
Writes : 3 [3.00e+00 ~ 0.3%]
Notes
*Non-overlapping variables and spills may share stack space,
so the total stack size might be less than this.
while the annotated listing looks like:
//
// ------- Annotated listing with optimization reports for "/export/iusers/rcox2/rgHF/v.c" -------
//
//INLINING OPTION VALUES:
// -inline-factor: 100
// -inline-min-size: 20
// -inline-max-size: 230
// -inline-max-total-size: 2000
// -inline-max-per-routine: 10000
// -inline-max-per-compile: 500000
//
1 void bar();
2 void foo() { bar(); }
//INLINE REPORT: (foo()) [1] /export/iusers/rcox2/rgHF/v.c(2,12)
//
///export/iusers/rcox2/rgHF/v.c(2,12):remark #34051: REGISTER ALLOCATION : [foo] /export/iusers/rcox2/rgHF/v.c:2
//
// Hardware registers
// Reserved : 1[ esp]
// Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]
// Callee-save : 4[ ebx ebp esi edi]
// Assigned : 0[ reg_null]
//
// Routine temporaries
// Total : 4
// Global : 0
// Local : 4
// Regenerable : 0
// Spilled : 0
//
// Routine stack
// Variables : 0 bytes*
// Reads : 0 [0.00e+00 ~ 0.0%]
// Writes : 0 [0.00e+00 ~ 0.0%]
// Spills : 0 bytes*
// Reads : 0 [0.00e+00 ~ 0.0%]
// Writes : 0 [0.00e+00 ~ 0.0%]
//
// Notes
//
// *Non-overlapping variables and spills may share stack space,
// so the total stack size might be less than this.
//
//
3
4 void Test(int *res, int *c, int *d, int *p, int n) {
//INLINE REPORT: (Test(int *, int *, int *, int *, int)) [2] /export/iusers/rcox2/rgHF/v.c(4,52)
// -> INLINE: (16,3) foo()
// -> INLINE: (18,3) foo()
// -> INLINE: (18,17) foo()
//
///export/iusers/rcox2/rgHF/v.c(4,52):remark #34051: REGISTER ALLOCATION : [Test] /export/iusers/rcox2/rgHF/v.c:4
//
// Hardware registers
// Reserved : 1[ esp]
// Available : 23[ eax edx ecx ebx ebp esi edi mm0-mm7 zmm0-zmm7]
// Callee-save : 4[ ebx ebp esi edi]
// Assigned : 15[ eax edx ecx ebx ebp esi edi zmm0-zmm7]
//
// Routine temporaries
// Total : 123
// Global : 47
// Local : 76
// Regenerable : 5
// Spilled : 6
//
// Routine stack
// Variables : 0 bytes*
// Reads : 0 [0.00e+00 ~ 0.0%]
// Writes : 0 [0.00e+00 ~ 0.0%]
// Spills : 8 bytes*
// Reads : 5 [1.41e+01 ~ 1.4%]
// Writes : 3 [3.00e+00 ~ 0.3%]
//
// Notes
//
// *Non-overlapping variables and spills may share stack space,
// so the total stack size might be less than this.
//
//
5 int i;
6
7 #pragma simd
8 for (i = 0; i < 1600; i++) {
//
//LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)
//<Peeled loop for vectorization>
//LOOP END
//
//LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)
// remark #15301: SIMD LOOP WAS VECTORIZED
//LOOP END
//
//LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)
//<Alternate Alignment Vectorized Loop>
//LOOP END
//
//LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(8,8)
//<Remainder loop for vectorization>
// remark #15335: remainder loop was not vectorized: vectorization possible but seems inefficient. Use vector always directive or -vec-threshold0 to override
//LOOP END
9 res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
10 }
11
12 for (i = 0; i < 16; i++) {
//
//LOOP BEGIN at /export/iusers/rcox2/rgHF/v.c(12,3)
// remark #15344: loop was not vectorized: vector dependence prevents vectorization. First dependence is shown below. Use level 5 report for details
// remark #15346: vector dependence: assumed FLOW dependence between res[i] (13:5) and d[i] (13:5)
// remark #25436: completely unrolled by 16
//LOOP END
13 res[i] = (p[i] == 0) ? res[i] : res[i] + d[i];
14 }
15
16 foo();
17
18 foo(); bar(); foo();
19 }
essentially, various parts of the optimization report are inserted into a listing at the appropriate line numbers.
(Note that this is just the default level. More detail can be obtained with -qopt-report=X where X>1 (up to 5 is supported)).
I believe what Hal is proposing in this patch is a very useful light-weight annotation of the source with key information. But I also believe that there is value for a stand-alone opt report with the kind of detailed information I presented in http://reviews.llvm.org/D19397 and the two follow up patches. In general, while this info can be interspersed in the source listing, I believe that for most purposes it is a bit too "busy" in text form. (The Intel compiler also supports annotated html and functionality that feeds into Visual Studio that has received great reviews.)
http://reviews.llvm.org/D19678
More information about the cfe-commits
mailing list