[PATCH] D120193: [X86][SSE] Attempt to lower vec_reduce_add patterns with PSADBW for zero-extended vXi8 sources
Phoebe Wang via Phabricator via llvm-commits
llvm-commits at lists.llvm.org
Sun Feb 27 04:39:11 PST 2022
pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.
LGTM.
================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:43062-43064
+ Rdx = DAG.getNode(ISD::TRUNCATE, DL, ByteVT, Rdx);
+ if (ByteVT.getSizeInBits() < 128)
+ Rdx = WidenToV16I8(Rdx, true);
----------------
RKSimon wrote:
> pengfei wrote:
> > I don't understand the code quite well, some doubts:
> > 1. If the source are known <= 255, why do we need truncate it. Should be better to bitcast directly?
> > 2. If the ByteVT < 128, why don't we widen it with undef and return the value of lane 0 after PSADBW?
> 1 - I'm not sure I follow - PSADBW could be used with purely bitcasted data but we'd lose some of the benefits of avoiding several stages of reduction. But I guess it could be useful for i16/i32 data to avoid a truncation and still get a horizontal-sum over fewer elements - is that what you meant?
>
> 2 - WidenToV16I8 does leave the upper 64-bit element undef, the v4i8 ISD::INSERT_VECTOR_ELT codepath is just to avoid some issues with combines failing to make use of the the implicit upperbit zeroing of MOVD.
1 - Yes.
2 - I see. I thought `ZeroExtend` will zeroing the upper 64-bit.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D120193/new/
https://reviews.llvm.org/D120193
More information about the llvm-commits
mailing list