[PATCH] D15408: [AArch64/LoopUnrollRuntime] Don't avoid high-cost trip count computation on the AArch64

Mon Dec 14 12:04:15 PST 2015

kristof.beyls added a comment.

Hi Junmo,

I think it isn't possible to correctly compute the geomean without taking into account the execution times of all programs that are part of the suite.
I've done the computation of the effect of this patch on geomeans for the test-suite, spec2000 and spec2006, on Cortex-A53 and Cortex-A57, taking into account the execution time of all the programs in the suite.
I've used the multi-sampling infrastructure in LNT to run each program multiple times and took the median value of all program runs for a given program.

This gives me the following results - where numbers larger than 100% mean the patch gives a speedup, and numbers lower than 100% mean the patch gives a slow-down:

On Cortex-A57:
lnt/test-suite: 99.9%
spec2000: 100.3%
spec2006: 100.3%

On Cortex-A53:
lnt/test-suite: 99.8%
spec2000: 99.8%
spec2006: 99.9%

Furthermore, on a number of other commercial benchmark suites I also saw 0.2% to 0.6% regressions in performance on the overall benchmark score.
I think these measurements show that the patch overall results in a slight regression in performance.
Therefore, as is, I don't think the patch should be committed.
I'm wondering if there is any scope to make the AllowExpensiveTripCount heuristics smarter or more selective based on e.g. code characteristics?
Although the comments from James and Chad earlier indicate that that may need the compiler to guess very well how many iterations there are in inner loops, which is probably very hard without PGO information.

Thanks,

Kristof

http://reviews.llvm.org/D15408