[PATCH] D98587: [X86] Optimize vXi8 MULHS on targets where we can't sign_extend to the next register size.

Craig Topper via Phabricator via llvm-commits llvm-commits at lists.llvm.org
Sat Mar 13 11:45:01 PST 2021


craig.topper created this revision.
craig.topper added reviewers: RKSimon, spatel.
Herald added subscribers: pengfei, hiraditya.
craig.topper requested review of this revision.
Herald added a project: LLVM.

For these cases we need to extract the upper or lower elements,
multiply them using 16-bit multiplies and repack them.

Previously we used punpcklbw/punpckhbw+psraw or pmovsxbw+pshudfd to
extract and sign extend so we could use pmullw to compute the 16-bit
product and then shift down the high bits.

We can avoid the need to sign extend if we unpack the bytes into
the high byte of each word and fill the lower byte with 0 using
pxor. This puts the sign bit of each byte into the sign bit of
each word. Since the LHS and RHS have 8 trailing zeros, the full
32-bit product of those 16-bit values will have 16 trailing zeros.
This means the 16-bit product of the original bytes is in the upper
16 bits which we can calculate using pmulhw.


Repository:
  rG LLVM Github Monorepo

https://reviews.llvm.org/D98587

Files:
  llvm/lib/Target/X86/X86ISelLowering.cpp
  llvm/test/CodeGen/X86/combine-sdiv.ll
  llvm/test/CodeGen/X86/vec_smulo.ll
  llvm/test/CodeGen/X86/vector-idiv-sdiv-128.ll
  llvm/test/CodeGen/X86/vector-idiv-sdiv-256.ll
  llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll

-------------- next part --------------
A non-text attachment was scrubbed...
Name: D98587.330462.patch
Type: text/x-patch
Size: 245242 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20210313/502e1e77/attachment-0001.bin>


More information about the llvm-commits mailing list