[LLVMdev] [LNT] Question about results reliability in LNT infrustructure
tobias at grosser.es
Mon Jul 1 08:53:00 PDT 2013
On 06/23/2013 11:12 PM, Star Tan wrote:
> Hi all,
> When we compare two testings, each of which is run with three samples, how would LNT show whether the comparison is reliable or not?
> I have seen that the function get_value_status in reporting/analysis.py uses a very simple algorithm to infer data status. For example, if abs(self.delta) <= (self.stddev * confidence_interval), then the data status is set as UNCHANGED. However, it is obviously not enough. For example, assuming both self.delta (e.g. 60%) and self.stddev (e.g. 50%) are huge, but self.delta is slightly larger than self.stddev, LNT will report to readers that the performance improvement is huge without considering the huge stddev. I think one way is to normalize the performance improvements by considering the stddev, but I am not sure whether it has been implemented in LNT.
> Could anyone give some suggestions that how can I find out whether the testing results are reliable in LNT? Specifically, how can I get the normalized performance improvement/regression by considering the stderr?
Hi Star Tan,
I just attached you some hacks I tried on the week-end. The attached
patch prints the confidence intervals in LNT. If you like you can take
them as an inspiration (not directly copy) to print those values in your
lnt server. (The patches require scipy and numpy being installed in your
python sandbox. This should be OK for our experiments, but we probably
do not want to reimplement those functions before upstreaming).
Also, as Anton suggested. It may make sense to rerun your experiments
with a larger number of samples. As the machine is currently not loaded
and we do not track individual commits, 10 samples should probably be
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 8799 bytes
Desc: not available
More information about the llvm-dev