[PATCH] D80466: [X86] Improve i8 + 'slow' i16 funnel shift codegen

Tue May 26 13:05:48 PDT 2020

craig.topper added inline comments.

================
Comment at: llvm/test/CodeGen/X86/fshl.ll:22-23
+; X86-NEXT:    movl {{[0-9]+}}(%esp), %eax
+; X86-NEXT:    shll $8, %eax
+; X86-NEXT:    orl %edx, %eax
+; X86-NEXT:    andb $7, %cl
----------------
RKSimon wrote:
> foad wrote:
> > Would it be worth trying to generate just `movb %al, %dh` instead of zext+shll+orl?
> Yes that might be useful but probably should be done generally. I don't know much about the hi-byte move logic @craig.topper might be able to advise?
I think you'd have to jump through some hoops to get the register allocator to do it. You'd need an INSERT_SUBREG to force the join. Possibly even a pseudo instruction on 64-bit to force NOREX on the other register to avoid an encoding issue.

I'm not sure it makes sense to write an h register on modern Intel CPUs. It guarantees a merge uop needs to be generated when bits 15:8 and 7:0 are both read by the consuming instruction.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D80466/new/

https://reviews.llvm.org/D80466