Reordering two functions can slow down lld by 1.06 times

Ansari, Zia via llvm-commits llvm-commits at lists.llvm.org
Fri Oct 21 09:24:56 PDT 2016


OK.. I'm grabbing your reproducers and I'll take a look.

Thanks,
Zia.

|-----Original Message-----
|From: Rafael Espíndola [mailto:rafael.espindola at gmail.com]
|Sent: Friday, October 21, 2016 8:39 AM
|To: Mehdi Amini <mehdi.amini at apple.com>
|Cc: llvm-commits <llvm-commits at lists.llvm.org>; Rui Ueyama
|<ruiu at google.com>; Davide Italiano <dccitaliano at gmail.com>; Ansari, Zia
|<zia.ansari at intel.com>
|Subject: Re: Reordering two functions can slow down lld by 1.06 times
|
|Thanks!
|
|I doesn't look like the DSB issue, given that the main difference is in the
|number of branch being mispredicted. It seems to be the other case in the
|bug:
|
|-------------------------------
|Regarding the other details reported in this issue, I realize that the slow vs. fast
|cases both had 0 mod 32 byte alignment. It’s hard to do the analysis on what
|the issue there was, without having the exact code and the exact (old)
|architecture on which it was run. If I had to guess, I would say that it was a case
|of unfortunate aliasing in the branch prediction buffer, causing differences in
|the prediction of one of the many branches, particularly the indirect branch,
|which is known to have prediction issues on some older architectures.
|-----------------------------------
|
|Zia, if you want to take a look I now have an easy to reproduce case :-)
|
|Cheers,
|Rafael
|
|
|
|On 21 October 2016 at 11:14, Mehdi Amini <mehdi.amini at apple.com> wrote:
|> The attachment to this PR: https://llvm.org/bugs/show_bug.cgi?id=5615
|> Explains why this could happen.
|>
|> (This could well be a different case here, but it may be related, or
|> hint toward a similar type of problem).
|>
|> OTH.
|>
|>|> Mehdi
|>
|> On Oct 21, 2016, at 7:30 AM, Rafael Espíndola via llvm-commits
|> <llvm-commits at lists.llvm.org> wrote:
|>
|> This is sufficiently crazy that I decided to create an easy reproducible.
|>
|> I uploaded it to
|> https://drive.google.com/open?id=0B7iRtublysV6WmZPZzh5LUpSZUU
|>
|> I also tested it on a i7-3840QM where the problem reproduces exactly
|> and on a AMD Opteron(tm) Processor 6380 where the two binaries have
|> exactly the same performance.
|>
|> Craig, all that I was able to find about branch prediction alias
|> problems was a suggestion on the intel optimization manual to align
|> branch targets, but looks like that is not the problem here. Any idea
|> if there is anything that can be done to avoid this problem?
|>
|> Thanks,
|> Rafael
|>
|>
|> On 20 October 2016 at 17:09, Rafael Espíndola
|> <rafael.espindola at gmail.com> wrote:
|>
|> I spend most of the day reducing an oddity I noticed while
|> benchmarking a small patch.
|>
|> It turns out that just reordering two adjacent functions can have a
|> massive impact on performance. The two binaries are in
|>
|> https://drive.google.com/open?id=0B7iRtublysV6VW5VVW1na2N1RGM
|>
|> https://drive.google.com/open?id=0B7iRtublysV6MUJoeGVCRHpXVUU
|>
|> And the total diff of the objdump is attached.
|>
|> When linking xul with one of the binaries I get
|>
|> 98,298,725      branch-misses             #    2.24% of all branches
|> 7.206486289 seconds time elapsed
|>
|> With the other I get
|>
|> 139,849,372      branch-misses             #    3.18% of all branches
|> 7.645573494 seconds time elapsed
|>
|> Adding enough padding before the function gets the performance back,
|> which suggests an aliasing problem in the branch predictor.
|>
|> The cpu is a E5-2697 (Ivy Bridge). Is anyone familiar with its branch
|> predictor and how to avoid hitting these problems?
|>
|> Cheers,
|> Rafael
|>
|> _______________________________________________
|> llvm-commits mailing list
|> llvm-commits at lists.llvm.org
|> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
|>
|>


More information about the llvm-commits mailing list