[LLVMdev] Cute profiling toy for LLVM
Chris Lattner
sabre at nondot.org
Sat Nov 1 23:32:00 PST 2003
Because I've been doing a bit of performance work recently, and because
using gprof with the C backend has some limitations, I wrote a little
"llvm-prof" utility. Here's a synopsis of how to use it if you're
interested:
Basic usage:
llvm/utils/profile.pl <program.bc> <program arguments>
This instruments the bytecode file, executes it with the JIT (_appending_
information into an llvmprof.out file), then runs the llvm-prof utility to
format it into a human readable report (llvm-prof is documented here:
http://llvm.cs.uiuc.edu/docs/CommandGuide/llvm-prof.html ) Running this
on the em3d Olden benchmark produces this output:
<all of the program output>
===-------------------------------------------------------------------------===
LLVM profiling output for execution:
Output/em3d.llvm.bc
===-------------------------------------------------------------------------===
Function execution frequencies:
## Frequency
1. 390/516 check_percent
2. 102/516 gen_signed_number
3. 2/516 compute_nodes
4. 2/516 make_table
5. 2/516 fill_table
6. 2/516 make_neighbors
7. 2/516 update_from_coeffs
8. 2/516 fill_from_fields
9. 2/516 localize_local
10. 1/516 initialize_graph
11. 1/516 clear_nummiss
12. 1/516 localize
13. 1/516 fill_all_from_fields
14. 1/516 update_all_from_coeffs
15. 1/516 make_all_neighbors
16. 1/516 make_tables
17. 1/516 __main
18. 1/516 main
19. 1/516 dealwithargs
NOTE: 1 function was never executed!
I've implemented function and basicblock profiling, because they were
simple. We should be able to add the path profiling component with little
trouble. The number of blocks instrumented could be reduced significantly
by making use of control equivalent blocks, but this optimization is not
done yet. To get basic block counts, run the same as before, but with
the -block option:
$ ~/llvm/utils/profile.pl -block Output/em3d.llvm.bc
<all of the stuff from before>
===-------------------------------------------------------------------------===
Top 20 most frequently executed basic blocks:
## %% Frequency
1. 4.60% 393/8545 make_neighbors() - no_exit.2
2. 4.56% 390/8545 make_neighbors() - loopentry.2
3. 4.56% 390/8545 make_neighbors() - endif.1
4. 4.56% 390/8545 make_neighbors() - endif.2
5. 4.56% 390/8545 make_neighbors() - loopexit.3
6. 4.56% 390/8545 check_percent() - entry
7. 4.53% 387/8545 make_neighbors() - endif.3
8. 4.49% 384/8545 fill_from_fields() - no_exit.1
9. 4.49% 384/8545 fill_from_fields() - endif.0
10. 4.49% 384/8545 make_neighbors() - loopexit.2
11. 4.49% 384/8545 make_neighbors() - shortcirc_next
12. 4.49% 384/8545 make_neighbors() - endif.4
13. 3.37% 288/8545 check_percent() - then
14. 3.07% 262/8545 make_neighbors() - no_exit.2.preheader
15. 1.84% 157/8545 compute_nodes() - endif.1
16. 1.84% 157/8545 compute_nodes() - no_exit.1
17. 1.84% 157/8545 compute_nodes() - then.0
18. 1.84% 157/8545 compute_nodes() - then.1
19. 1.84% 157/8545 compute_nodes() - endif.0
20. 1.50% 128/8545 fill_from_fields() - no_exit.0
Finally, if you pass -A to the script, llvm-prof will print out the LLVM
source code for the program, annotated with frequency counts. Like this:
<snip>
;;; %check_percent called 390 times.
;;;
internal int %check_percent(int %percent.1) {
entry: ; No predecessors!
;;; Executed 390 times.
%tmp.0 = call double %drand48( ) ; <double> [#uses=1]
%tmp.2 = cast int %percent.1 to double ; <double> [#uses=1]
%tmp.3 = div double %tmp.2, 0x4059000000000000 ; <double> [#uses=1]
%tmp.4 = setlt double %tmp.0, %tmp.3 ; <bool> [#uses=1]
%tmp.5 = cast bool %tmp.4 to int ; <int> [#uses=3]
%tmp.6 = load int* %.percentcheck_1 ; <int> [#uses=1]
%inc.0 = add int %tmp.6, 1 ; <int> [#uses=1]
store int %inc.0, int* %.percentcheck_1
%tmp.8 = setne int %tmp.5, 0 ; <bool> [#uses=1]
br bool %tmp.8, label %then, label %endif
then: ; preds = %entry
;;; Executed 288 times.
%tmp.10 = load int* %.numlocal_2 ; <int> [#uses=1]
%inc.1 = add int %tmp.10, 1 ; <int> [#uses=1]
store int %inc.1, int* %.numlocal_2
ret int %tmp.5
endif: ; preds = %entry
;;; Executed 102 times.
ret int %tmp.5
}
</snip>
If you're interested, this is implemented by the following code:
lib/Transforms/Instrumentation/BlockProfiling.cpp
runtime/libprofile/
tools/llvm-prof/
utils/profile.pl
I have given only a little thought on how to integrate this with the JIT
and runtime system, but perhaps this is a step towards FDO. The code
should be pretty simple to extend to new profiling implementations, and
add lots of cool features. If you think of any neat extensions, please
let me know.
-Chris
--
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/
More information about the llvm-dev
mailing list