[PATCH] D14971: X86: Emit smaller code for moving 8-bit immediates
Hans Wennborg via llvm-commits
llvm-commits at lists.llvm.org
Mon Nov 30 19:30:51 PST 2015
hans added a comment.
In http://reviews.llvm.org/D14971#299111, @silvas wrote:
> > A problem with the OR approach is that there's a dependency on the previous value in %eax. The DEC approach avoids that, but maybe DEC is slow on some micro-architectures?
>
>
> DEC is what we use in the backend for counted loops, so I wouldn't worry.
> (i.e. we lower
>
> for (int i = 0; i < n; i++)
> bar();
>
>
> into a loop like
>
> 1:
> ...
> decl %ebx
> jnz 1b
>
>
> )
>
> For every x86 microarchitecture I'm familiar with, on paper xor+dec seems preferable to push+pop. I would avoid doing push+pop unless we can get some insight into what ICC is shooting for / exploiting here. E.g. push+pop on AMD Jaguar creates microops on the load unit which is a bottleneck point:
> (microbenchmarked the 64-bit analog to confirm:
> http://reviews.llvm.org/F1123937
> http://reviews.llvm.org/F1123938
> )
Thanks for looking into this! How does the "or -1" approach compare in your benchmark?
http://reviews.llvm.org/D14971
More information about the llvm-commits
mailing list