[PATCH] D98587: [X86] Optimize vXi8 MULHS on targets where we can't sign_extend to the next register size.
Craig Topper via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Wed Mar 17 09:44:31 PDT 2021
craig.topper added inline comments.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:27495
+ // For signed we'll use punpcklbw/punpckbw to extend the bytes to words and
+ // shift them left into the upper byte of each word. This allows us to use
+ // pmulhw to calculate the full 16-bit product. This trick means we don't
----------------
RKSimon wrote:
> Isn't the shift just for RHS constants?
We only use a shift for constants, though its should get constant folded away. That was me trying to explain the layout of the data. We use unpck in a way that extends the data and shifts it in the upper bits at the same time.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:27515
SDValue BLo, BHi;
if (ISD::isBuildVectorOfConstantSDNodes(B.getNode())) {
// If the RHS is a constant, manually unpackl/unpackh and extend.
----------------
RKSimon wrote:
> Do we need this constant path any more? It seemed to be to avoid problems with the SSE41 path that's been removed.
Maybe not. I'll check.
================
Comment at: llvm/test/CodeGen/X86/vec_smulo.ll:1803
; SSE41-NEXT: pslld $31, %xmm8
; SSE41-NEXT: psrad $31, %xmm8
+; SSE41-NEXT: pshufd {{.*#+}} xmm2 = xmm4[2,3,2,3]
----------------
RKSimon wrote:
> Not related - but we should be able to remove this sext-in-reg code by replace the pmovzxbd with pmovzxsd (is that a hidden any_extend?)
I'll check. Any idea why these tests use large result types like 16xi32 for a 16xi8 multiply?
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D98587/new/
https://reviews.llvm.org/D98587
More information about the llvm-commits
mailing list