[PATCH] D14971: X86: Emit smaller code for moving 8-bit immediates

Sean Silva via llvm-commits llvm-commits at lists.llvm.org
Mon Nov 30 18:51:42 PST 2015


silvas added a subscriber: silvas.
silvas added a comment.

> A problem with the OR approach is that there's a dependency on the previous value in %eax. The DEC approach avoids that, but maybe DEC is slow on some micro-architectures?


DEC is what we use in the backend for counted loops, so I wouldn't worry.
(i.e. we lower

  for (int i = 0; i < n; i++)
    bar();

into a loop like

  1:
  ...
  decl %ebx
  jnz 1b

)

For every x86 microarchitecture I'm familiar with, on paper xor+dec seems preferable to push+pop. I would avoid doing push+pop unless we can get some insight into what ICC is shooting for / exploiting here. E.g. push+pop on AMD Jaguar creates microops on the load unit which is a bottleneck point:
(microbenchmarked the 64-bit analog to confirm:
http://reviews.llvm.org/F1123937
http://reviews.llvm.org/F1123938
)


http://reviews.llvm.org/D14971





More information about the llvm-commits mailing list