[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

Sun Jun 23 23:12:36 PDT 2013

Hi all,

When we compare two testings, each of which is run with three samples, how would LNT show whether the comparison is reliable or not? 

I have seen that the function get_value_status in reporting/analysis.py uses a very simple algorithm to infer data status. For example, if abs(self.delta) <= (self.stddev * confidence_interval), then the data status is set as UNCHANGED.  However, it is obviously not enough. For example, assuming both self.delta (e.g. 60%) and self.stddev (e.g. 50%) are huge, but self.delta is slightly larger than self.stddev, LNT will report to readers that the performance improvement is huge without considering the huge stddev. I think one way is to normalize the performance improvements by considering the stddev, but I am not sure whether it has been implemented in LNT.

Could anyone give some suggestions that how can I find out whether the testing results are reliable in LNT? Specifically, how can I get the normalized performance improvement/regression by considering the stderr?

Best wishes,
Star Tan.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130624/b4257fc4/attachment.html>