[PATCH] D14971: X86: Emit smaller code for moving 8-bit immediates

Tue Dec 8 16:43:29 PST 2015

hans added a comment.

In http://reviews.llvm.org/D14971#304892, @DavidKreitzer wrote:

> Bottom Line: Using xor/inc & xor/dec for constant loads of 1 & -1 is preferred over using push/pop when optimizing for size. The fact that ICC fails to special-case 1 & -1 is a bit of an oversight. On modern Intel Core processors, the xor/inc & xor/dec idioms will be faster than push/pop. inc/dec are slower than add/sub on some older mainstream processors (e.g. Pentium 4) and also on modern smaller core processors like Silvermont due to the partial flag update. But the xor/inc, xor/dec idioms should still perform no worse than the push/pop sequence.
>
> I also echo Sean's advice to avoid "or -1" in most cases. The one exception (which could be a TODO) is that "or -1" is the best option for machines that are not affected by the false dependence, i.e. strictly in-order machines such as older Atoms. If we know for sure that the code will only be run on such a machine, we could take advantage of this fact.

Thanks for the information! Sounds like we're on the same page here. I'll upload a new patch that uses xor inc/dec.

I still think the push/pop trick is pretty neat, but it might be more suitable for minsize, as you say.

http://reviews.llvm.org/D14971