[PATCH] D120193: [X86][SSE] Attempt to lower vec_reduce_add patterns with PSADBW for zero-extended vXi8 sources

Sun Feb 27 04:39:11 PST 2022

pengfei accepted this revision.
pengfei added a comment.
This revision is now accepted and ready to land.

LGTM.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:43062-43064
+    Rdx = DAG.getNode(ISD::TRUNCATE, DL, ByteVT, Rdx);
+    if (ByteVT.getSizeInBits() < 128)
+      Rdx = WidenToV16I8(Rdx, true);
----------------
RKSimon wrote:
> pengfei wrote:
> > I don't understand the code quite well, some doubts:
> > 1. If the source are known <= 255, why do we need truncate it. Should be better to bitcast directly?
> > 2. If the ByteVT < 128, why don't we widen it with undef and return the value of lane 0 after PSADBW?
> 1 - I'm not sure I follow - PSADBW could be used with purely bitcasted data but we'd lose some of the benefits of avoiding several stages of reduction. But I guess it could be useful for i16/i32 data to avoid a truncation and still get a horizontal-sum over fewer elements - is that what you meant?
> 
> 2 - WidenToV16I8 does leave the upper 64-bit element undef, the v4i8 ISD::INSERT_VECTOR_ELT codepath is just to avoid some issues with combines failing to make use of the the implicit upperbit zeroing of MOVD.
1 - Yes.
2 - I see. I thought `ZeroExtend` will zeroing the upper 64-bit.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120193/new/

https://reviews.llvm.org/D120193