[PATCH] Avoid generating SHLD/SHRD for architectures that are known to have poor latency for these instructions.

Fri Nov 15 15:55:05 PST 2013

*nod* Agreed, right now she's got the correct bit in for that.

-eric

On Fri, Nov 15, 2013 at 3:46 PM, Robinson, Paul
<Paul_Robinson at playstation.sony.com> wrote:
> It would still benefit -Oz/-Os...
>
>> -----Original Message-----
>> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-
>> bounces at cs.uiuc.edu] On Behalf Of Nadav Rotem
>> Sent: Friday, November 15, 2013 3:39 PM
>> To: Eric Christopher
>> Cc: llvm-commits at cs.uiuc.edu;
>> reviews+D2177+public+2550a59af40981ae at llvm-reviews.chandlerc.com;
>> Romanova, Katya
>> Subject: Re: [PATCH] Avoid generating SHLD/SHRD for architectures that
>> are known to have poor latency for these instructions.
>>
>> It sounds like modern processors don't implement SHLD/SHRD efficiently.
>> Do we know of any x86 processors that DO benefit from this peephole?  We
>> may be able to delete this code altogether.
>>
>>
>> On Nov 15, 2013, at 3:34 PM, Eric Christopher <echristo at gmail.com>
>> wrote:
>>
>> > Awesome. This should probably be turned on at least for Ivy Bridge
>> > where we have numbers.
>> >
>> > Nadav: ?
>> >
>> > -eric
>> >
>> > On Fri, Nov 15, 2013 at 3:08 PM, Katya Romanova
>> > <Katya_Romanova at playstation.sony.com> wrote:
>> >>
>> >>  Hi Eric,
>> >>
>> >>  I didn't have access to the machine with the one of the latest
>> Intel's processors. So, I asked one of my friends, Dmitry Babokin, who
>> works on ISPC compiler in Moscow, to do this performance testing on one
>> of Intel's latest architectures. I generated 2 assembly files with LLVM
>> compiler (with and without SHLD) for the following test:
>> >>
>> >>  int64_t s128(uint64_t a, uint64_t b, int shift)
>> >>  {
>> >>      return (a << shift) | (b >> (64-shift));
>> >>  }
>> >>  uint64_t s128i(uint64_t a, uint64_t b)
>> >>  {
>> >>      return s128(a, b, 7);
>> >>  }
>> >>
>> >>
>> >>  Dmitry ran called s128i function 100 million times. The test with
>> shld instruction took 2.18 sec to finish. The test using alternative
>> sequence of instructions took 1.89 sec, which is 13.3 % faster. All the
>> experiments were done on Ivy Bridge architecture.
>> >>
>> >>  Dmitri also confirmed that on Ivy Bridge Intel's compiler 13.0
>> generates code *without* shld instructions.
>> >>
>> >>  It will be nice to get a full list of Intel's architectures where
>> shld instruction has very high latency.
>> >>
>> >> http://llvm-reviews.chandlerc.com/D2177
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
>