Reordering two functions can slow down lld by 1.06 times

Fri Oct 21 07:30:15 PDT 2016

This is sufficiently crazy that I decided to create an easy reproducible.

I uploaded it to https://drive.google.com/open?id=0B7iRtublysV6WmZPZzh5LUpSZUU

I also tested it on a i7-3840QM where the problem reproduces exactly
and on a AMD Opteron(tm) Processor 6380 where the two binaries have
exactly the same performance.

Craig, all that I was able to find about branch prediction alias
problems was a suggestion on the intel optimization manual to align
branch targets, but looks like that is not the problem here. Any idea
if there is anything that can be done to avoid this problem?

Thanks,
Rafael

On 20 October 2016 at 17:09, Rafael Espíndola
<rafael.espindola at gmail.com> wrote:
> I spend most of the day reducing an oddity I noticed while
> benchmarking a small patch.
>
> It turns out that just reordering two adjacent functions can have a
> massive impact on performance. The two binaries are in
>
> https://drive.google.com/open?id=0B7iRtublysV6VW5VVW1na2N1RGM
>
> https://drive.google.com/open?id=0B7iRtublysV6MUJoeGVCRHpXVUU
>
> And the total diff of the objdump is attached.
>
> When linking xul with one of the binaries I get
>
> 98,298,725      branch-misses             #    2.24% of all branches
> 7.206486289 seconds time elapsed
>
> With the other I get
>
> 139,849,372      branch-misses             #    3.18% of all branches
> 7.645573494 seconds time elapsed
>
> Adding enough padding before the function gets the performance back,
> which suggests an aliasing problem in the branch predictor.
>
> The cpu is a E5-2697 (Ivy Bridge). Is anyone familiar with its branch
> predictor and how to avoid hitting these problems?
>
> Cheers,
> Rafael