[PATCH] D120193: [X86][SSE] Attempt to lower vec_reduce_add patterns with PSADBW for zero-extended vXi8 sources

Sun Feb 27 03:54:23 PST 2022

RKSimon added inline comments.

================
Comment at: llvm/lib/Target/X86/X86ISelLowering.cpp:43062-43064
+    Rdx = DAG.getNode(ISD::TRUNCATE, DL, ByteVT, Rdx);
+    if (ByteVT.getSizeInBits() < 128)
+      Rdx = WidenToV16I8(Rdx, true);
----------------
pengfei wrote:
> I don't understand the code quite well, some doubts:
> 1. If the source are known <= 255, why do we need truncate it. Should be better to bitcast directly?
> 2. If the ByteVT < 128, why don't we widen it with undef and return the value of lane 0 after PSADBW?
1 - I'm not sure I follow - PSADBW could be used with purely bitcasted data but we'd lose some of the benefits of avoiding several stages of reduction. But I guess it could be useful for i16/i32 data to avoid a truncation and still get a horizontal-sum over fewer elements - is that what you meant?

2 - WidenToV16I8 does leave the upper 64-bit element undef, the v4i8 ISD::INSERT_VECTOR_ELT codepath is just to avoid some issues with combines failing to make use of the the implicit upperbit zeroing of MOVD.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D120193/new/

https://reviews.llvm.org/D120193