[PATCH] D153696: [LV] Only generate 1st part outside of vector region for VPInstruction.

Tue Jul 4 07:09:48 PDT 2023

Ayal added inline comments.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/scalable-strict-fadd.ll:403
+; CHECK-ORDERED-TF-NEXT:    [[TMP8:%.*]] = icmp ugt i64 [[N]], [[TMP6]]
+; CHECK-ORDERED-TF-NEXT:    [[TMP9:%.*]] = select i1 [[TMP8]], i64 [[TMP7]], i64 0
 ; CHECK-ORDERED-TF-NEXT:    [[ACTIVE_LANE_MASK_ENTRY:%.*]] = call <vscale x 8 x i1> @llvm.get.active.lane.mask.nxv8i1.i64(i64 0, i64 [[N]])
----------------
Ayal wrote:
> Suffice to have one instance of this {call-vscale;mul-by-**32**;sub;icmp;select} sequence defined in the preheader than four uniform replicas. But what's the reason for eliminating the {call-vscale;mul-by-**8/16/24**;add} **non-identical** sequences defined in the preheader, which are not uniform? Would be good to clarify if they all stem from the same recipe, and what the UF is.
> 
> Note that, in general, a recipe placed in the preheader could prepare a distinct value per each part. E.g., to initialize an add reduction with distinct starting value in first lane/part, rather than adding it to the sum at the end. But suffice to generate a uniform-across-UF value for a single part, and reuse it across all parts.
Is this cleanup the expected behavior, presumably associated with (one? two?) VPInstruction::ActiveLaneMask placed in the preheader? May be helpful to dump the VPlan, although its (uniform or non-uniform) recipes are compressed for all parts.

The rest seems fine.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153696/new/

https://reviews.llvm.org/D153696