[PATCH] Avoid generating SHLD/SHRD for architectures that are known to have poor latency for these instructions.
Katya_Romanova at playstation.sony.com
Fri Nov 15 15:08:27 PST 2013
I didn't have access to the machine with the one of the latest Intel's processors. So, I asked one of my friends, Dmitry Babokin, who works on ISPC compiler in Moscow, to do this performance testing on one of Intel's latest architectures. I generated 2 assembly files with LLVM compiler (with and without SHLD) for the following test:
int64_t s128(uint64_t a, uint64_t b, int shift)
return (a << shift) | (b >> (64-shift));
uint64_t s128i(uint64_t a, uint64_t b)
return s128(a, b, 7);
Dmitry ran called s128i function 100 million times. The test with shld instruction took 2.18 sec to finish. The test using alternative sequence of instructions took 1.89 sec, which is 13.3 % faster. All the experiments were done on Ivy Bridge architecture.
Dmitri also confirmed that on Ivy Bridge Intel's compiler 13.0 generates code *without* shld instructions.
It will be nice to get a full list of Intel's architectures where shld instruction has very high latency.
More information about the llvm-commits