[all-commits] [llvm/llvm-project] 69bdf3: [X86] Optimize vXi8 MULHS on targets where we can'...
Craig Topper via All-commits
all-commits at lists.llvm.org
Sun Mar 28 11:50:37 PDT 2021
Branch: refs/heads/main
Home: https://github.com/llvm/llvm-project
Commit: 69bdf35dc70c6c1efd9e622d1e49041ca2a10f0c
https://github.com/llvm/llvm-project/commit/69bdf35dc70c6c1efd9e622d1e49041ca2a10f0c
Author: Craig Topper <craig.topper at sifive.com>
Date: 2021-03-28 (Sun, 28 Mar 2021)
Changed paths:
M llvm/lib/Target/X86/X86ISelLowering.cpp
M llvm/test/CodeGen/X86/combine-sdiv.ll
M llvm/test/CodeGen/X86/vec_smulo.ll
M llvm/test/CodeGen/X86/vector-idiv-sdiv-128.ll
M llvm/test/CodeGen/X86/vector-idiv-sdiv-256.ll
M llvm/test/CodeGen/X86/vector-idiv-sdiv-512.ll
Log Message:
-----------
[X86] Optimize vXi8 MULHS on targets where we can't sign_extend to the next register size.
For these cases we need to extract the upper or lower elements,
multiply them using 16-bit multiplies and repack them.
Previously we used punpcklbw/punpckhbw+psraw or pmovsxbw+pshudfd to
extract and sign extend so we could use pmullw to compute the 16-bit
product and then shift down the high bits.
We can avoid the need to sign extend if we unpack the bytes into
the high byte of each word and fill the lower byte with 0 using
pxor. This puts the sign bit of each byte into the sign bit of
each word. Since the LHS and RHS have 8 trailing zeros, the full
32-bit product of those 16-bit values will have 16 trailing zeros.
This means the 16-bit product of the original bytes is in the upper
16 bits which we can calculate using pmulhw.
Reviewed By: RKSimon
Differential Revision: https://reviews.llvm.org/D98587
More information about the All-commits
mailing list