[PATCH] D14971: X86: Emit smaller code for moving 8-bit immediates

Mon Nov 30 19:30:51 PST 2015

hans added a comment.

In http://reviews.llvm.org/D14971#299111, @silvas wrote:

> > A problem with the OR approach is that there's a dependency on the previous value in %eax. The DEC approach avoids that, but maybe DEC is slow on some micro-architectures?
>
>
> DEC is what we use in the backend for counted loops, so I wouldn't worry.
>  (i.e. we lower
>
>   for (int i = 0; i < n; i++)
>     bar();
>
>
> into a loop like
>
>   1:
>   ...
>   decl %ebx
>   jnz 1b
>
>
> )
>
> For every x86 microarchitecture I'm familiar with, on paper xor+dec seems preferable to push+pop. I would avoid doing push+pop unless we can get some insight into what ICC is shooting for / exploiting here. E.g. push+pop on AMD Jaguar creates microops on the load unit which is a bottleneck point:
>  (microbenchmarked the 64-bit analog to confirm:
>  http://reviews.llvm.org/F1123937
>  http://reviews.llvm.org/F1123938
>  )

Thanks for looking into this! How does the "or -1" approach compare in your benchmark?

http://reviews.llvm.org/D14971