[PATCH] D80466: [X86] Improve i8 + 'slow' i16 funnel shift codegen

Tue May 26 14:45:01 PDT 2020

efriedma added inline comments.

================
Comment at: llvm/test/CodeGen/X86/fshl.ll:22-23
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    shll $8, %eax
+; X86-NEXT:    orl %edx, %eax
+; X86-NEXT:    andb $7, %cl
----------------
craig.topper wrote:
> RKSimon wrote:
> > foad wrote:
> > > Would it be worth trying to generate just `movb %al, %dh` instead of zext+shll+orl?
> > Yes that might be useful but probably should be done generally. I don't know much about the hi-byte move logic @craig.topper might be able to advise?
> I think you'd have to jump through some hoops to get the register allocator to do it. You'd need an INSERT_SUBREG to force the join. Possibly even a pseudo instruction on 64-bit to force NOREX on the other register to avoid an encoding issue.
> 
> I'm not sure it makes sense to write an h register on modern Intel CPUs. It guarantees a merge uop needs to be generated when bits 15:8 and 7:0 are both read by the consuming instruction.
On processors that don't have special rename machinery for 8-bit registers, it should simply save an instruction, if it's legal. On big Intel cores, even if it doesn't save a uop, it should still be smaller.

That said, even if it's profitable in this exact case, the register allocation constraints to make it work are really tight; it's probably only worthwhile if the values are already in ABCD registers.

================
Comment at: llvm/test/CodeGen/X86/fshl.ll:26
+; X86-NEXT:    shll %cl, %eax
+; X86-NEXT:    movb %ah, %al
 ; X86-NEXT:    retl
----------------
We should probably prefer shrl over movb.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80466/new/

https://reviews.llvm.org/D80466