[LLVMdev] Use perf tool for more accurate time measuring on Linux

Yi Kong kongy.dev at gmail.com
Tue May 20 14:55:29 PDT 2014


On 20 May 2014 22:21, Tobias Grosser <tobias at grosser.es> wrote:
> On 20/05/2014 22:00, Yi Kong wrote:
>>
>> On 20 May 2014 17:55, Tobias Grosser <tobias at grosser.es> wrote:
>>>
>>> On 20/05/2014 18:20, Yi Kong wrote:
>>>>
>>>>
>>>> On 20 May 2014 16:40, Tobias Grosser <tobias at grosser.es> wrote:
>>>>>
>>>>>
>>>>> On 20/05/2014 16:01, Yi Kong wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> I've set up a public LNT server to show the result of perf stat. There
>>>>>> is a huge improvement compared with timeit tool.
>>>>>> http://parkas16.inria.fr:8000/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Hi Yi Kong,
>>>>>
>>>>> thanks for testing these changes.
>>>>>
>>>>>
>>>>>> Patch is updated to pin the process to a single core, the readings are
>>>>>> even more accurate. It's hard coded to run everything on core 0, so
>>>>>> don't run parallel testing with it for now. The tool now depends on
>>>>>> Linux perf and schedtool.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> I think this sounds like a very good direction.
>>>>>
>>>>> How did you evaluate the improvements exactly? The following run shows
>>>>> e.g
>>>>> two execution time changes:
>>>>
>>>>
>>>>
>>>> I sent a screenshot of original results in the previous mail. We used
>>>> to have lots of noise readings, both from small machine background
>>>> noise and large noise from the timing tool. Now noise from timing tool
>>>> is eliminated and only few machine background noise is left. This
>>>> makes manual investigation possible.
>>>
>>>
>>>
>>> I think we need to get this down to zero even at the cost of missing
>>> regressions. We have many commits and runs per day, having one or two
>>> noisy
>>> results per run means people will still not look at performance changes.
>>>
>>>
>>>>> http://parkas16.inria.fr:8000/db_default/v4/nts/9
>>>>>
>>>>> Are they expected? If I change e.g. the aggregation function to median
>>>>> they disappear. Similarly the graph for one of them does not suggest an
>>>>> actual performance change:
>>>>
>>>>
>>>>
>>>> Yes, some false positives due to machine noise is expected. Median is
>>>> more tolerant to machine noise, therefore they disappear.
>>>
>>>
>>>
>>> Right.
>>>
>>> What I find interesting is that this change filters several results that
>>> seem to not be filtered out by our statistical test. Is this right?
>>
>>
>> Yes. MWU test is nonparametric, it examines the order rather than the
>> actual value of the samples. However eliminating with median uses
>> actual value(if medians of two samples are close enough, we treat them
>> as equal).
>
>
> I see. So some of the useful eliminations come from the fact that we
> actually run a parametric test? So we _do_ in this case take some
> assumptions about the distribution of the values, right?

Yes. You can check get_value_status() in
lnt/server/reporting/analysis.py to see how we determine significance.
I don't think making such assumption is good idea, as some tests have
very different distributions to others.

>>> In the optimal case, we should be able to set the confidence level we
>>> require high enough to filter out these results as well. Is this right?
>>
>>
>> Yes. The lowest confidence we can set is still quite high(90%). We can
>> certainly add a lower confidence option, but I can't find any MWU
>> table lower than that on the Internet.
>
>
> Why the lowest confidence? I would be interested in maximal confidence to
> reduce noise.

Ah... I got it wrong way around. I agree with you.

> I found this table:
>
> http://www.stat.purdue.edu/~bogdanm/wwwSTAT503_fall/Tables/Wilcoxon.pdf
>
> I am not sure if those are the right values. Inside it says
> Wilcocan-Mann-Whitney U, but the filename suggests that the tables may be
> for the Wilcoxon signed-rank test.

That's indeed for the Wilcoxon signed-rank test.

>
>
>> Also, we should modify value analysis(based how close the
>> medians/minimums are) to vary according to the confidence level as
>> well. However this analysis is parametric, we needs to know how data
>> is actually distributed for every test. I don't think there is a
>> non-parametric test which does the same thing.
>
>
> What kind of problem could we get in case we assume normal distribution and
> the values are in fact not normal distributed?

If the distribution is in fact skewed, we will get lots of false negatives.

> Would we just fail to find a significant change? Or would we possibly let
> non-significant changes through?
>
> Under the assumption that there is a non-zero percentage of test cases
> where the performance results are normal distributed, it may be OK for a
> special low-noise-configuration to only get results from these test cases,
> but possibly ignore performance changes from the non-normal-distributed
> cases.

It's hard to test if execution time is normal distributed. The samples
are definately not normally distributed, because the measurements are
guaranteed upper-bound.

>
> Cheers,
> Tobias



More information about the llvm-dev mailing list