[LLVMdev] [LNT] Question about results reliability in LNT infrustructure

Thu Jun 27 09:25:27 PDT 2013

We don't really have a great answer yet. For now the best we do is try
and get our testing machines as quiet as possible and then mostly look
at the daily trend not individual reports.

 - Daniel

On Jun 27, 2013, at 9:05, Tobias Grosser <tobias at grosser.es> wrote:

> On 06/23/2013 11:12 PM, Star Tan wrote:
>> Hi all,
>>
>>
>> When we compare two testings, each of which is run with three samples, how would LNT show whether the comparison is reliable or not?
>>
>>
>> I have seen that the function get_value_status in reporting/analysis.py uses a very simple algorithm to infer data status. For example, if abs(self.delta) <= (self.stddev * confidence_interval), then the data status is set as UNCHANGED.  However, it is obviously not enough. For example, assuming both self.delta (e.g. 60%) and self.stddev (e.g. 50%) are huge, but self.delta is slightly larger than self.stddev, LNT will report to readers that the performance improvement is huge without considering the huge stddev. I think one way is to normalize the performance improvements by considering the stddev, but I am not sure whether it has been implemented in LNT.
>>
>>
>> Could anyone give some suggestions that how can I find out whether the testing results are reliable in LNT? Specifically, how can I get the normalized performance improvement/regression by considering the stderr?
>
> Hi Daniel, Michael, Paul,
>
> do you happen to have some insights on this? Basically, the stddev shown
> when a run is compared to a previous run does not seem to be useful to
> measure the reliability of the results shown. We are looking for a good
> way/value to show the reliability of individual results in the UI. Do you have some experience, what a good measure of the reliability of test results is?
>
> Thanks,
> Tobias