[PATCH] Avoid generating SHLD/SHRD for architectures that are known to have poor latency for these instructions.
Robinson, Paul
Paul_Robinson at playstation.sony.com
Fri Nov 15 15:46:04 PST 2013
It would still benefit -Oz/-Os...
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Nadav Rotem
> Sent: Friday, November 15, 2013 3:39 PM
> To: Eric Christopher
> Cc: llvm-commits at cs.uiuc.edu;
> reviews+D2177+public+2550a59af40981ae at llvm-reviews.chandlerc.com;
> Romanova, Katya
> Subject: Re: [PATCH] Avoid generating SHLD/SHRD for architectures that
> are known to have poor latency for these instructions.
>
> It sounds like modern processors don't implement SHLD/SHRD efficiently.
> Do we know of any x86 processors that DO benefit from this peephole? We
> may be able to delete this code altogether.
>
>
> On Nov 15, 2013, at 3:34 PM, Eric Christopher <echristo at gmail.com>
> wrote:
>
> > Awesome. This should probably be turned on at least for Ivy Bridge
> > where we have numbers.
> >
> > Nadav: ?
> >
> > -eric
> >
> > On Fri, Nov 15, 2013 at 3:08 PM, Katya Romanova
> > <Katya_Romanova at playstation.sony.com> wrote:
> >>
> >> Hi Eric,
> >>
> >> I didn't have access to the machine with the one of the latest
> Intel's processors. So, I asked one of my friends, Dmitry Babokin, who
> works on ISPC compiler in Moscow, to do this performance testing on one
> of Intel's latest architectures. I generated 2 assembly files with LLVM
> compiler (with and without SHLD) for the following test:
> >>
> >> int64_t s128(uint64_t a, uint64_t b, int shift)
> >> {
> >> return (a << shift) | (b >> (64-shift));
> >> }
> >> uint64_t s128i(uint64_t a, uint64_t b)
> >> {
> >> return s128(a, b, 7);
> >> }
> >>
> >>
> >> Dmitry ran called s128i function 100 million times. The test with
> shld instruction took 2.18 sec to finish. The test using alternative
> sequence of instructions took 1.89 sec, which is 13.3 % faster. All the
> experiments were done on Ivy Bridge architecture.
> >>
> >> Dmitri also confirmed that on Ivy Bridge Intel's compiler 13.0
> generates code *without* shld instructions.
> >>
> >> It will be nice to get a full list of Intel's architectures where
> shld instruction has very high latency.
> >>
> >> http://llvm-reviews.chandlerc.com/D2177
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list