Reordering two functions can slow down lld by 1.06 times

Fri Oct 21 09:11:18 PDT 2016

One more thing, any thoughts on doing setPrefFunctionAlignment(5) on
x86_64? That way a change in one function cannot impact how another
function behaves with respect to the DSB.

Cheers,
Rafael

On 21 October 2016 at 11:39, Rafael Espíndola
<rafael.espindola at gmail.com> wrote:
> Thanks!
>
> I doesn't look like the DSB issue, given that the main difference is
> in the number of branch being mispredicted. It seems to be the other
> case in the bug:
>
> -------------------------------
> Regarding the other details reported in this issue, I realize that the
> slow vs. fast cases both had 0 mod 32 byte alignment. It’s hard to do
> the analysis on what the issue there was, without having the exact
> code and the exact (old) architecture on which it was run. If I had to
> guess, I would say that it was a case of unfortunate aliasing in the
> branch prediction buffer, causing differences in the prediction of one
> of the many branches, particularly the indirect branch, which is known
> to have prediction issues on some older architectures.
> -----------------------------------
>
> Zia, if you want to take a look I now have an easy to reproduce case :-)
>
> Cheers,
> Rafael
>
>
>
> On 21 October 2016 at 11:14, Mehdi Amini <mehdi.amini at apple.com> wrote:
>> The attachment to this PR: https://llvm.org/bugs/show_bug.cgi?id=5615
>> Explains why this could happen.
>>
>> (This could well be a different case here, but it may be related, or hint
>> toward a similar type of problem).
>>
>> OTH.
>>
>> —
>> Mehdi
>>
>> On Oct 21, 2016, at 7:30 AM, Rafael Espíndola via llvm-commits
>> <llvm-commits at lists.llvm.org> wrote:
>>
>> This is sufficiently crazy that I decided to create an easy reproducible.
>>
>> I uploaded it to
>> https://drive.google.com/open?id=0B7iRtublysV6WmZPZzh5LUpSZUU
>>
>> I also tested it on a i7-3840QM where the problem reproduces exactly
>> and on a AMD Opteron(tm) Processor 6380 where the two binaries have
>> exactly the same performance.
>>
>> Craig, all that I was able to find about branch prediction alias
>> problems was a suggestion on the intel optimization manual to align
>> branch targets, but looks like that is not the problem here. Any idea
>> if there is anything that can be done to avoid this problem?
>>
>> Thanks,
>> Rafael
>>
>>
>> On 20 October 2016 at 17:09, Rafael Espíndola
>> <rafael.espindola at gmail.com> wrote:
>>
>> I spend most of the day reducing an oddity I noticed while
>> benchmarking a small patch.
>>
>> It turns out that just reordering two adjacent functions can have a
>> massive impact on performance. The two binaries are in
>>
>> https://drive.google.com/open?id=0B7iRtublysV6VW5VVW1na2N1RGM
>>
>> https://drive.google.com/open?id=0B7iRtublysV6MUJoeGVCRHpXVUU
>>
>> And the total diff of the objdump is attached.
>>
>> When linking xul with one of the binaries I get
>>
>> 98,298,725      branch-misses             #    2.24% of all branches
>> 7.206486289 seconds time elapsed
>>
>> With the other I get
>>
>> 139,849,372      branch-misses             #    3.18% of all branches
>> 7.645573494 seconds time elapsed
>>
>> Adding enough padding before the function gets the performance back,
>> which suggests an aliasing problem in the branch predictor.
>>
>> The cpu is a E5-2697 (Ivy Bridge). Is anyone familiar with its branch
>> predictor and how to avoid hitting these problems?
>>
>> Cheers,
>> Rafael
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>>