[PATCH] Avoid generating SHLD/SHRD for architectures that are known to have poor latency for these instructions.

Robinson, Paul Paul_Robinson at playstation.sony.com
Fri Nov 15 15:46:04 PST 2013


It would still benefit -Oz/-Os...

> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
> bounces at cs.uiuc.edu] On Behalf Of Nadav Rotem
> Sent: Friday, November 15, 2013 3:39 PM
> To: Eric Christopher
> Cc: llvm-commits at cs.uiuc.edu;
> reviews+D2177+public+2550a59af40981ae at llvm-reviews.chandlerc.com;
> Romanova, Katya
> Subject: Re: [PATCH] Avoid generating SHLD/SHRD for architectures that
> are known to have poor latency for these instructions.
> 
> It sounds like modern processors don't implement SHLD/SHRD efficiently.
> Do we know of any x86 processors that DO benefit from this peephole?  We
> may be able to delete this code altogether.
> 
> 
> On Nov 15, 2013, at 3:34 PM, Eric Christopher <echristo at gmail.com>
> wrote:
> 
> > Awesome. This should probably be turned on at least for Ivy Bridge
> > where we have numbers.
> >
> > Nadav: ?
> >
> > -eric
> >
> > On Fri, Nov 15, 2013 at 3:08 PM, Katya Romanova
> > <Katya_Romanova at playstation.sony.com> wrote:
> >>
> >>  Hi Eric,
> >>
> >>  I didn't have access to the machine with the one of the latest
> Intel's processors. So, I asked one of my friends, Dmitry Babokin, who
> works on ISPC compiler in Moscow, to do this performance testing on one
> of Intel's latest architectures. I generated 2 assembly files with LLVM
> compiler (with and without SHLD) for the following test:
> >>
> >>  int64_t s128(uint64_t a, uint64_t b, int shift)
> >>  {
> >>      return (a << shift) | (b >> (64-shift));
> >>  }
> >>  uint64_t s128i(uint64_t a, uint64_t b)
> >>  {
> >>      return s128(a, b, 7);
> >>  }
> >>
> >>
> >>  Dmitry ran called s128i function 100 million times. The test with
> shld instruction took 2.18 sec to finish. The test using alternative
> sequence of instructions took 1.89 sec, which is 13.3 % faster. All the
> experiments were done on Ivy Bridge architecture.
> >>
> >>  Dmitri also confirmed that on Ivy Bridge Intel's compiler 13.0
> generates code *without* shld instructions.
> >>
> >>  It will be nice to get a full list of Intel's architectures where
> shld instruction has very high latency.
> >>
> >> http://llvm-reviews.chandlerc.com/D2177
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits






More information about the llvm-commits mailing list