[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

Star Tan tanmx_star at yeah.net
Mon Jul 1 19:09:22 PDT 2013


At 2013-07-01 23:53:00,"Tobias Grosser" <tobias at grosser.es> wrote:

>On 06/23/2013 11:12 PM, Star Tan wrote:
>> Hi all,
>>
>>
>> When we compare two testings, each of which is run with three samples, how would LNT show whether the comparison is reliable or not?
>>
>>
>> I have seen that the function get_value_status in reporting/analysis.py uses a very simple algorithm to infer data status. For example, if abs(self.delta) <= (self.stddev * confidence_interval), then the data status is set as UNCHANGED.  However, it is obviously not enough. For example, assuming both self.delta (e.g. 60%) and self.stddev (e.g. 50%) are huge, but self.delta is slightly larger than self.stddev, LNT will report to readers that the performance improvement is huge without considering the huge stddev. I think one way is to normalize the performance improvements by considering the stddev, but I am not sure whether it has been implemented in LNT.
>>
>>
>> Could anyone give some suggestions that how can I find out whether the testing results are reliable in LNT? Specifically, how can I get the normalized performance improvement/regression by considering the stderr?
>
>Hi Star Tan,
>
>I just attached you some hacks I tried on the week-end. The attached 
>patch prints the confidence intervals in LNT. If you like you can take 
>them as an inspiration (not directly copy) to print those values in your 
>lnt server. (The patches require scipy and numpy being installed in your 
>python sandbox. This should be OK for our experiments, but we probably 
>do not want to reimplement those functions before upstreaming).
Wonderful. I will integrate them into our lnt server.
>
>Also, as Anton suggested. It may make sense to rerun your experiments 
>with a larger number of samples. As the machine is currently not loaded 
>and we do not track individual commits, 10 samples should probably be 
>good enough.
OK, I can rerun all tests with 10 samples tonight-:).
>
>Cheers,
>Tobias

Bests,
Star Tan.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20130702/8e2df375/attachment.html>


More information about the llvm-dev mailing list