[LLVMdev] LNT usage

Mon Mar 11 13:15:11 PDT 2013

While porting my backends to llvm-3.2, I found a few places where the 
optimizers could have performed better. I believe the mainstream targets can 
also benefits from my tweaks. But before upstreaming my changes,  I would like 
to quantify their merits on other applications --- not just my domain specific 
codes. In a word, it seemed the right time for me to start using LNT :) 

I followed the LNT quickstart guide, and somehow got some results, but have 
trouble using them. How are the other developers using lnt ? What is the 
workflow for comparing a patched lllvm to the baseline version ?

I am mainly interested in the tests' execution time.

1. I am bit confused by the 'lnt runtest' output:

$LNT_TOP/sandbox/bin/lnt runtest nt \
>   --sandbox=$LNT_TOP/SANDBOX3 \
>   --cc=$LLVM_BIN/clang \
>   --cxx=$LLVM_BIN/clang++ \
>   --test-suite=$LLVM_SRCS/projects/test-suite \
>   --llvm-src=$LLVM_SRCS \
>   --llvm-obj=$LLVM_BUILD
2013-03-11 18:41:22: checking source versions
2013-03-11 18:41:25: scanning for LNT-based test modules
2013-03-11 18:41:25: found 0 LNT-based test modules
2013-03-11 18:41:25: using nickname: 'm0__clang_DEV__x86_64'
2013-03-11 18:41:25: starting test in 
'.../LLVM/LNT/SANDBOX3/test-2013-03-11_18-41-22'
2013-03-11 18:41:25: configuring...
2013-03-11 18:41:34: executing "nightly tests" with -j1...
2013-03-11 19:00:42: executing test modules
2013-03-11 19:00:42: loading nightly test data...
2013-03-11 19:00:42: capturing machine information
2013-03-11 19:00:42: generating report: 
'.../LLVM/LNT/SANDBOX3/test-2013-03-11_18-41-22/report.json'
2013-03-11 18:41:22: submitting result to dummy instance
No handlers could be found for logger "lnt.server.db.migrate"
Importing 'tmpItDGOn.json'
Import succeeded.

Processing Times
----------------
Load   : 0.00s
Import : 0.22s
Report : 0.48s
Total  : 0.48s

Imported Data
-------------
Added Machines: 1
Added Runs    : 1
Added Tests   : 493

--- Tested: 986 tests --
FAIL: MultiSource/Applications/Burg/burg.execution_time (494 of 986)
FAIL: MultiSource/Applications/ClamAV/clamscan.execution_time (495 of 986)
FAIL: MultiSource/Applications/lemon/lemon.execution_time (496 of 986)
FAIL: MultiSource/Benchmarks/MiBench/automotive-bitcount/automotive-
bitcount.execution_time (497 of 986)
FAIL: MultiSource/Benchmarks/MiBench/telecomm-FFT/telecomm-fft.execution_time 
(498 of 986)
FAIL: MultiSource/Benchmarks/Olden/voronoi/voronoi.execution_time (499 of 986)
FAIL: MultiSource/Benchmarks/Ptrdist/anagram/anagram.execution_time (500 of 
986)
FAIL: MultiSource/Benchmarks/mafft/pairlocalalign.execution_time (501 of 986)
FAIL: SingleSource/Benchmarks/BenchmarkGame/puzzle.execution_time (502 of 986)
...

How many tests are there really : 493 or 986 ? Or does 493 refer to the number 
of built test programs, and the compilation time and execution time count as 2 
separate tests for the same program ?

2. Running lnt several times on the same unmodified clang+llvm binaries gives 
different results (execution time can vary wildly : ~200%). I am running lnt 
on linux/x86_64. I tried to deactivate the cpufreq thing, the machine was not 
loaded, but this did not change. Is there any way to run the tests multiple 
times and use statistics to get reproducible numbers (within a confidence 
interval) ? Or to handle the stats at the 'lnt import' stage ? Or in the web 
display ? In the web display, how to use the 'run' and 'order' fields ?

Thanks for any hint.

Cheers,
--
Arnaud de Grandmaison