Reordering two functions can slow down lld by 1.06 times
Ansari, Zia via llvm-commits
llvm-commits at lists.llvm.org
Fri Oct 21 09:24:56 PDT 2016
OK.. I'm grabbing your reproducers and I'll take a look.
|From: Rafael Espíndola [mailto:rafael.espindola at gmail.com]
|Sent: Friday, October 21, 2016 8:39 AM
|To: Mehdi Amini <mehdi.amini at apple.com>
|Cc: llvm-commits <llvm-commits at lists.llvm.org>; Rui Ueyama
|<ruiu at google.com>; Davide Italiano <dccitaliano at gmail.com>; Ansari, Zia
|<zia.ansari at intel.com>
|Subject: Re: Reordering two functions can slow down lld by 1.06 times
|I doesn't look like the DSB issue, given that the main difference is in the
|number of branch being mispredicted. It seems to be the other case in the
|Regarding the other details reported in this issue, I realize that the slow vs. fast
|cases both had 0 mod 32 byte alignment. It’s hard to do the analysis on what
|the issue there was, without having the exact code and the exact (old)
|architecture on which it was run. If I had to guess, I would say that it was a case
|of unfortunate aliasing in the branch prediction buffer, causing differences in
|the prediction of one of the many branches, particularly the indirect branch,
|which is known to have prediction issues on some older architectures.
|Zia, if you want to take a look I now have an easy to reproduce case :-)
|On 21 October 2016 at 11:14, Mehdi Amini <mehdi.amini at apple.com> wrote:
|> The attachment to this PR: https://llvm.org/bugs/show_bug.cgi?id=5615
|> Explains why this could happen.
|> (This could well be a different case here, but it may be related, or
|> hint toward a similar type of problem).
|> On Oct 21, 2016, at 7:30 AM, Rafael Espíndola via llvm-commits
|> <llvm-commits at lists.llvm.org> wrote:
|> This is sufficiently crazy that I decided to create an easy reproducible.
|> I uploaded it to
|> I also tested it on a i7-3840QM where the problem reproduces exactly
|> and on a AMD Opteron(tm) Processor 6380 where the two binaries have
|> exactly the same performance.
|> Craig, all that I was able to find about branch prediction alias
|> problems was a suggestion on the intel optimization manual to align
|> branch targets, but looks like that is not the problem here. Any idea
|> if there is anything that can be done to avoid this problem?
|> On 20 October 2016 at 17:09, Rafael Espíndola
|> <rafael.espindola at gmail.com> wrote:
|> I spend most of the day reducing an oddity I noticed while
|> benchmarking a small patch.
|> It turns out that just reordering two adjacent functions can have a
|> massive impact on performance. The two binaries are in
|> And the total diff of the objdump is attached.
|> When linking xul with one of the binaries I get
|> 98,298,725 branch-misses # 2.24% of all branches
|> 7.206486289 seconds time elapsed
|> With the other I get
|> 139,849,372 branch-misses # 3.18% of all branches
|> 7.645573494 seconds time elapsed
|> Adding enough padding before the function gets the performance back,
|> which suggests an aliasing problem in the branch predictor.
|> The cpu is a E5-2697 (Ivy Bridge). Is anyone familiar with its branch
|> predictor and how to avoid hitting these problems?
|> llvm-commits mailing list
|> llvm-commits at lists.llvm.org
More information about the llvm-commits