[PATCH] Avoid generating SHLD/SHRD for architectures that are known to have poor latency for these instructions.

Fri Nov 15 15:08:27 PST 2013

  Hi Eric,

  I didn't have access to the machine with the one of the latest Intel's processors. So, I asked one of my friends, Dmitry Babokin, who works on ISPC compiler in Moscow, to do this performance testing on one of Intel's latest architectures. I generated 2 assembly files with LLVM compiler (with and without SHLD) for the following test:

  int64_t s128(uint64_t a, uint64_t b, int shift)
  {
      return (a << shift) | (b >> (64-shift));
  }
  uint64_t s128i(uint64_t a, uint64_t b)
  {
      return s128(a, b, 7);
  }

  Dmitry ran called s128i function 100 million times. The test with shld instruction took 2.18 sec to finish. The test using alternative sequence of instructions took 1.89 sec, which is 13.3 % faster. All the experiments were done on Ivy Bridge architecture.

  Dmitri also confirmed that on Ivy Bridge Intel's compiler 13.0 generates code *without* shld instructions.

  It will be nice to get a full list of Intel's architectures where shld instruction has very high latency.

http://llvm-reviews.chandlerc.com/D2177