[PATCH] D15792: [AArch64] Turn off PredictableSelectIsExpensive on the Cortex-A57

James Molloy via llvm-commits llvm-commits at lists.llvm.org
Wed Jan 6 06:01:35 PST 2016


jmolloy requested changes to this revision.
jmolloy added a comment.
This revision now requires changes to proceed.

Hi,

Yes, SPEC has three workload types for every benchmark: "test", "train" and "ref". "ref" is the longest-running workload and is the only one that is useful in performance analysis. "test" and "train" often execute different codepaths within the benchmark too, so results are not directly comparable.

That said, I have some concerns about the data as given:

1. One repeat for SPEC is just not enough. The Samsung S6 numbers look substantially less variable than the Juno numbers, and it's difficult to see what is causing that when the repeat counts of each are different.

2. Soplex has a large improvement (25%) which I happen to know is related to imperfect branch prediction. I think this should be discounted from the results for the purposes of comparison (there is an argument that increasing the number of selects decreases pressure on the branch predictor, reducing the likeliness of branch prediction oddities however).

3. Libquantum, which is a really good test of predication versus branches, has regressed by 12% on Juno and yet is utterly unchanged on S6. I'm unconvinced by this.

4. h264, which is one of the most real-world workloads for the mobile space in SPEC has regressed by 8% on Juno.

5. S6 almost looks like complete noise - there are no significant changes anywhere which I have difficulty believing. The largest swing is a 4% regression in 179.art, while Juno has a 12% improvement. These do not correlate.

6. In fact, the pearson's correlation coefficient between the two datasets is -0.09, which is low enough to assume no correlation (even slight negative correlation!), so I distrust the numbers.

Sorry,

James


http://reviews.llvm.org/D15792





More information about the llvm-commits mailing list