[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

Mon Jul 1 08:53:00 PDT 2013

On 06/23/2013 11:12 PM, Star Tan wrote:
> Hi all,
>
>
> When we compare two testings, each of which is run with three samples, how would LNT show whether the comparison is reliable or not?
>
>
> I have seen that the function get_value_status in reporting/analysis.py uses a very simple algorithm to infer data status. For example, if abs(self.delta) <= (self.stddev * confidence_interval), then the data status is set as UNCHANGED.  However, it is obviously not enough. For example, assuming both self.delta (e.g. 60%) and self.stddev (e.g. 50%) are huge, but self.delta is slightly larger than self.stddev, LNT will report to readers that the performance improvement is huge without considering the huge stddev. I think one way is to normalize the performance improvements by considering the stddev, but I am not sure whether it has been implemented in LNT.
>
>
> Could anyone give some suggestions that how can I find out whether the testing results are reliable in LNT? Specifically, how can I get the normalized performance improvement/regression by considering the stderr?

Hi Star Tan,

I just attached you some hacks I tried on the week-end. The attached 
patch prints the confidence intervals in LNT. If you like you can take 
them as an inspiration (not directly copy) to print those values in your 
lnt server. (The patches require scipy and numpy being installed in your 
python sandbox. This should be OK for our experiments, but we probably 
do not want to reimplement those functions before upstreaming).

Also, as Anton suggested. It may make sense to rerun your experiments 
with a larger number of samples. As the machine is currently not loaded 
and we do not track individual commits, 10 samples should probably be 
good enough.

Cheers,
Tobias
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-My-confidence-measurement-hacks.patch
Type: text/x-diff
Size: 8799 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130701/fab53a78/attachment.patch>