[LLVMdev] Dynamic Profiling - Instrumentation basic query
alastairmurray42 at gmail.com
Mon Jan 14 22:11:10 PST 2013
On 14/01/13 01:47, Silky Arora wrote:
> I need to profile the code for branches (branch mis predicts
> simulation), load/store instructions (for cache hits/miss rate), and a
> couple of other things and therefore, would need to instrument the code.
> However, I would like to know if writing the output to a file would
> increase the execution time, or is it the profiling itself? I can
> probably use a data structure to store the output instead.
> Also, I have heard of Intel's Pin tool which can provide memory trace
> information. Could you please explain to me what you meant by hardware
> counters for dcache miss/hit rates.
I've also heard of Pin, but never actually used it.
Regarding the hardware counters: x86 processors count various hardware
events via internal counters. I think both Intel and AMD processors can
do this, but I've only tried out Intel. The easiest way to access these
on Linux is probably via the 'perf' tool . (There are other options
on other platforms. I think 'Intel VTune' can use these counters as well.)
The result of running 'perf' on a random command (xz -9e dictionary) is
in the attached file (because my mail client was destroying the
formatting). I just chose some counters which seemed to match what you
mention, there were many more though. 'perf list' will show them. The
only issue I can think of is that the hardware counters aren't available
inside (most?) virtual machines.
If you need to individually determine the hit/miss-rate, mispredict
ratios etc per load/store/branch then I'm not sure if these counters are
-------------- next part --------------
/usr/sbin/perf stat -e cycles -e instructions -e cache-references -e cache-misses -e branch-instructions -e branch-misses -e L1-dcache-loads -e L1-dcache-load-misses -e L1-dcache-stores -e L1-dcache-store-misses -e dTLB-loads -e dTLB-load-misses xz -9e dictionary
Performance counter stats for 'xz -9e dictionary':
2,838,843,997 cycles # 0.000 GHz [24.96%]
3,017,892,661 instructions # 1.06 insns per cycle [33.31%]
28,281,385 cache-references [33.29%]
6,820,873 cache-misses # 24.118 % of all cache refs [33.31%]
403,480,157 branches [16.70%]
34,978,751 branch-misses # 8.67% of all branches [16.71%]
# 3.00% of all L1-dcache hits [16.70%]
# 0.15% of all dTLB cache hits [16.67%]
2.892917184 seconds time elapsed
More information about the llvm-dev