Reordering two functions can slow down lld by 1.06 times

Fri Oct 21 08:39:18 PDT 2016

Thanks!

I doesn't look like the DSB issue, given that the main difference is
in the number of branch being mispredicted. It seems to be the other
case in the bug:

-------------------------------
Regarding the other details reported in this issue, I realize that the
slow vs. fast cases both had 0 mod 32 byte alignment. It’s hard to do
the analysis on what the issue there was, without having the exact
code and the exact (old) architecture on which it was run. If I had to
guess, I would say that it was a case of unfortunate aliasing in the
branch prediction buffer, causing differences in the prediction of one
of the many branches, particularly the indirect branch, which is known
to have prediction issues on some older architectures.
-----------------------------------

Zia, if you want to take a look I now have an easy to reproduce case :-)

Cheers,
Rafael

On 21 October 2016 at 11:14, Mehdi Amini <mehdi.amini at apple.com> wrote:
> The attachment to this PR: https://llvm.org/bugs/show_bug.cgi?id=5615
> Explains why this could happen.
>
> (This could well be a different case here, but it may be related, or hint
> toward a similar type of problem).
>
> OTH.
>
> —
> Mehdi
>
> On Oct 21, 2016, at 7:30 AM, Rafael Espíndola via llvm-commits
> <llvm-commits at lists.llvm.org> wrote:
>
> This is sufficiently crazy that I decided to create an easy reproducible.
>
> I uploaded it to
> https://drive.google.com/open?id=0B7iRtublysV6WmZPZzh5LUpSZUU
>
> I also tested it on a i7-3840QM where the problem reproduces exactly
> and on a AMD Opteron(tm) Processor 6380 where the two binaries have
> exactly the same performance.
>
> Craig, all that I was able to find about branch prediction alias
> problems was a suggestion on the intel optimization manual to align
> branch targets, but looks like that is not the problem here. Any idea
> if there is anything that can be done to avoid this problem?
>
> Thanks,
> Rafael
>
>
> On 20 October 2016 at 17:09, Rafael Espíndola
> <rafael.espindola at gmail.com> wrote:
>
> I spend most of the day reducing an oddity I noticed while
> benchmarking a small patch.
>
> It turns out that just reordering two adjacent functions can have a
> massive impact on performance. The two binaries are in
>
> https://drive.google.com/open?id=0B7iRtublysV6VW5VVW1na2N1RGM
>
> https://drive.google.com/open?id=0B7iRtublysV6MUJoeGVCRHpXVUU
>
> And the total diff of the objdump is attached.
>
> When linking xul with one of the binaries I get
>
> 98,298,725      branch-misses             #    2.24% of all branches
> 7.206486289 seconds time elapsed
>
> With the other I get
>
> 139,849,372      branch-misses             #    3.18% of all branches
> 7.645573494 seconds time elapsed
>
> Adding enough padding before the function gets the performance back,
> which suggests an aliasing problem in the branch predictor.
>
> The cpu is a E5-2697 (Ivy Bridge). Is anyone familiar with its branch
> predictor and how to avoid hitting these problems?
>
> Cheers,
> Rafael
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>