[all-commits] [llvm/llvm-project] 69bdf3: [X86] Optimize vXi8 MULHS on targets where we can'...

Sun Mar 28 11:50:37 PDT 2021

  Branch: refs/heads/main
  Home:   https://github.com/llvm/llvm-project
  Commit: 69bdf35dc70c6c1efd9e622d1e49041ca2a10f0c
      https://github.com/llvm/llvm-project/commit/69bdf35dc70c6c1efd9e622d1e49041ca2a10f0c
  Author: Craig Topper <craig.topper at sifive.com>
  Date:   2021-03-28 (Sun, 28 Mar 2021)

  Changed paths:
    M llvm/lib/Target/X86/X86ISelLowering.cpp
    M llvm/test/CodeGen/X86/combine-sdiv.ll
    M llvm/test/CodeGen/X86/vec_smulo.ll
    M llvm/test/CodeGen/X86/vector-idiv-sdiv-128.ll
    M llvm/test/CodeGen/X86/vector-idiv-sdiv-256.ll
    M llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll

  Log Message:
  -----------
  [X86] Optimize vXi8 MULHS on targets where we can't sign_extend to the next register size.

For these cases we need to extract the upper or lower elements,
multiply them using 16-bit multiplies and repack them.

Previously we used punpcklbw/punpckhbw+psraw or pmovsxbw+pshudfd to
extract and sign extend so we could use pmullw to compute the 16-bit
product and then shift down the high bits.

We can avoid the need to sign extend if we unpack the bytes into
the high byte of each word and fill the lower byte with 0 using
pxor. This puts the sign bit of each byte into the sign bit of
each word. Since the LHS and RHS have 8 trailing zeros, the full
32-bit product of those 16-bit values will have 16 trailing zeros.
This means the 16-bit product of the original bytes is in the upper
16 bits which we can calculate using pmulhw.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D98587