[PATCH] D14971: X86: Emit smaller code for moving 8-bit immediates

Tue Dec 8 07:40:25 PST 2015

DavidKreitzer added a comment.

Hi Hans,

I'd have responded sooner, but I was travelling last week and am still catching up.

Bottom Line: Using xor/inc & xor/dec for constant loads of 1 & -1 is preferred over using push/pop when optimizing for size. The fact that ICC fails to special-case 1 & -1 is a bit of an oversight. On modern Intel Core processors, the xor/inc & xor/dec idioms will be faster than push/pop. inc/dec are slower than add/sub on some older mainstream processors (e.g. Pentium 4) and also on modern smaller core processors like Silvermont due to the partial flag update. But the xor/inc, xor/dec idioms should still perform no worse than the push/pop sequence.

I also echo Sean's advice to avoid "or -1" in most cases. The one exception (which could be a TODO) is that "or -1" is the best option for machines that are not affected by the false dependence, i.e. strictly in-order machines such as older Atoms. If we know for sure that the code will only be run on such a machine, we could take advantage of this fact.

-Dave

http://reviews.llvm.org/D14971