[LLVMdev] Cute profiling toy for LLVM

Chris Lattner sabre at nondot.org
Sat Nov 1 23:32:00 PST 2003


Because I've been doing a bit of performance work recently, and because
using gprof with the C backend has some limitations, I wrote a little
"llvm-prof" utility.  Here's a synopsis of how to use it if you're
interested:

Basic usage:
   llvm/utils/profile.pl <program.bc> <program arguments>

This instruments the bytecode file, executes it with the JIT (_appending_
information into an llvmprof.out file), then runs the llvm-prof utility to
format it into a human readable report (llvm-prof is documented here:
http://llvm.cs.uiuc.edu/docs/CommandGuide/llvm-prof.html )  Running this
on the em3d Olden benchmark produces this output:

  <all of the program output>
===-------------------------------------------------------------------------===
LLVM profiling output for execution:
  Output/em3d.llvm.bc

===-------------------------------------------------------------------------===
Function execution frequencies:

 ##   Frequency
  1.   390/516 check_percent
  2.   102/516 gen_signed_number
  3.     2/516 compute_nodes
  4.     2/516 make_table
  5.     2/516 fill_table
  6.     2/516 make_neighbors
  7.     2/516 update_from_coeffs
  8.     2/516 fill_from_fields
  9.     2/516 localize_local
 10.     1/516 initialize_graph
 11.     1/516 clear_nummiss
 12.     1/516 localize
 13.     1/516 fill_all_from_fields
 14.     1/516 update_all_from_coeffs
 15.     1/516 make_all_neighbors
 16.     1/516 make_tables
 17.     1/516 __main
 18.     1/516 main
 19.     1/516 dealwithargs

  NOTE: 1 function was never executed!

I've implemented function and basicblock profiling, because they were
simple. We should be able to add the path profiling component with little
trouble. The number of blocks instrumented could be reduced significantly
by making use of control equivalent blocks, but this optimization is not
done yet. To get basic block counts, run the same as before, but with
the -block option:

$ ~/llvm/utils/profile.pl -block Output/em3d.llvm.bc
 <all of the stuff from before>
===-------------------------------------------------------------------------===
Top 20 most frequently executed basic blocks:

 ##      %%   Frequency
  1.  4.60%   393/8545  make_neighbors() - no_exit.2
  2.  4.56%   390/8545  make_neighbors() - loopentry.2
  3.  4.56%   390/8545  make_neighbors() - endif.1
  4.  4.56%   390/8545  make_neighbors() - endif.2
  5.  4.56%   390/8545  make_neighbors() - loopexit.3
  6.  4.56%   390/8545  check_percent() - entry
  7.  4.53%   387/8545  make_neighbors() - endif.3
  8.  4.49%   384/8545  fill_from_fields() - no_exit.1
  9.  4.49%   384/8545  fill_from_fields() - endif.0
 10.  4.49%   384/8545  make_neighbors() - loopexit.2
 11.  4.49%   384/8545  make_neighbors() - shortcirc_next
 12.  4.49%   384/8545  make_neighbors() - endif.4
 13.  3.37%   288/8545  check_percent() - then
 14.  3.07%   262/8545  make_neighbors() - no_exit.2.preheader
 15.  1.84%   157/8545  compute_nodes() - endif.1
 16.  1.84%   157/8545  compute_nodes() - no_exit.1
 17.  1.84%   157/8545  compute_nodes() - then.0
 18.  1.84%   157/8545  compute_nodes() - then.1
 19.  1.84%   157/8545  compute_nodes() - endif.0
 20.  1.50%   128/8545  fill_from_fields() - no_exit.0

Finally, if you pass -A to the script, llvm-prof will print out the LLVM
source code for the program, annotated with frequency counts.  Like this:

<snip>
;;; %check_percent called 390 times.
;;;
internal int %check_percent(int %percent.1) {
entry:          ; No predecessors!
;;; Executed 390 times.
        %tmp.0 = call double %drand48( )                ; <double> [#uses=1]
        %tmp.2 = cast int %percent.1 to double          ; <double> [#uses=1]
        %tmp.3 = div double %tmp.2, 0x4059000000000000          ; <double> [#uses=1]
        %tmp.4 = setlt double %tmp.0, %tmp.3            ; <bool> [#uses=1]
        %tmp.5 = cast bool %tmp.4 to int                ; <int> [#uses=3]
        %tmp.6 = load int* %.percentcheck_1             ; <int> [#uses=1]
        %inc.0 = add int %tmp.6, 1              ; <int> [#uses=1]
        store int %inc.0, int* %.percentcheck_1
        %tmp.8 = setne int %tmp.5, 0            ; <bool> [#uses=1]
        br bool %tmp.8, label %then, label %endif

then:           ; preds = %entry
;;; Executed 288 times.
        %tmp.10 = load int* %.numlocal_2                ; <int> [#uses=1]
        %inc.1 = add int %tmp.10, 1             ; <int> [#uses=1]
        store int %inc.1, int* %.numlocal_2
        ret int %tmp.5

endif:          ; preds = %entry
;;; Executed 102 times.
        ret int %tmp.5
}
</snip>

If you're interested, this is implemented by the following code:

lib/Transforms/Instrumentation/BlockProfiling.cpp
runtime/libprofile/
tools/llvm-prof/
utils/profile.pl

I have given only a little thought on how to integrate this with the JIT
and runtime system, but perhaps this is a step towards FDO.  The code
should be pretty simple to extend to new profiling implementations, and
add lots of cool features.  If you think of any neat extensions, please
let me know.

-Chris

-- 
http://llvm.cs.uiuc.edu/
http://www.nondot.org/~sabre/Projects/




More information about the llvm-dev mailing list