[LLVMdev] Very slow performance of lli on x86
Prasanth J
j.prasanth.j at gmail.com
Sun Nov 15 22:44:50 PST 2009
Hi all,
I have attached the complete test suite. it has different directories for
gcc, llvm-gcc , clang and lli-clang. Source code , makefile and run script
(contains number of times the program should execute) for each case are
available inside each directory.
*
FOLLOWING ARE THE STATISTICS WHILE USING LLI FOR SINGLE ITERATION*
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
58 dagcombine - Number of dag nodes combined
16384 jit - Number of bytes of global vars initialized
357 jit - Number of bytes of machine code compiled
2 jit - Number of global vars initialized
27 jit - Number of relocations applied
3 jit - Number of slabs of memory allocated by the JIT
105 liveintervals - Number of original intervals
21 loop-reduce - Number of IV uses strength reduced
4 loop-reduce - Number of PHIs inserted
2 loop-reduce - Number of loop terminating conds optimized
1 machine-licm - Number of machine instructions hoisted out of loops
4 phielim - Number of atomic phis lowered
2 regalloc - Number of copies coalesced
27 regalloc - Number of iterations performed
3 regcoalescing - Number of cross class joins performed
44 regcoalescing - Number of identity moves eliminated after
coalescing
1 regcoalescing - Number of instructions re-materialized
40 regcoalescing - Number of interval joins performed
2 scalar-evolution - Number of loops with predictable loop counts
4 twoaddrinstr - Number of instructions aggressively commuted
6 twoaddrinstr - Number of instructions commuted to coalesce
3 twoaddrinstr - Number of instructions re-materialized
23 twoaddrinstr - Number of two-address instructions
2 virtregrewriter - Number of copies elided
1 x86-codegen - Number of floating point instructions
84 x86-emitter - Number of machine instructions emitted
real 0m0.043s
user 0m0.027s
sys 0m0.010s
*FOLLOWING ARE THE STATISTICS WHILE FORCING LLI TO USE INTERPRETER FOR
SINGLE ITERATION*
===-------------------------------------------------------------------------===
... Statistics Collected ...
===-------------------------------------------------------------------------===
147495 interpreter - Number of dynamic instructions executed
17735 jit - Number of bytes of global vars initialized
49 jit - Number of global vars initialized
real 0m0.083s
user 0m0.078s
sys 0m0.003s
Even for single iteration the time take for execution is pretty high when
compared to gcc, llvm-gcc and clang.
What should be the expected behavior while using lli? As per my
understanding as lli does runtime optimizations it should be faster than
clang and llvm-gcc. am i right?
*My machine details are*
*Linux localhost.localdomain 2.6.25-14.fc9.i686 #1 SMP Thu May 1 06:28:41
EDT 2008 i686 i686 i386 GNU/Linux*
*Memory : 1GB DDR2
CPU: Intel Pentium Dual-core @ 2.00 GHz*
Please let me know how can i proceed with this test.
Thanks and Regards,
Prasanth J
On Mon, Nov 16, 2009 at 1:06 AM, Eric Christopher <echristo at apple.com>wrote:
>
> On Nov 14, 2009, at 11:52 PM, Prasanth J wrote:
>
> > step 4:
> > running monolith.bc for 10000 iterations using lli tool and measured the
> time.
>
> How are you doing this?
>
> -eric
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091116/918a9562/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: generic_asm.tgz
Type: application/x-gzip
Size: 62726 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20091116/918a9562/attachment.bin>
More information about the llvm-dev
mailing list