[PATCH] D148841: [LV] Use SCEV for uniformity analysis across VF

Mon May 29 06:19:35 PDT 2023

Ayal added inline comments.

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2590-2592
+    if (CannotAnalyze)
+      return S;
+    if (SE.isLoopInvariant(S, TheLoop))
----------------

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2647
+  // first lane.
+  return all_of(reverse(seq<unsigned>(1, FixedVF)), [&](unsigned I) {
+    SCEVAddRecForUniformityRewriter IthLaneRewriter(*SE, FixedVF, I, TheLoop);
----------------
nit: "1 .. FixedVF-1"
nit: "first lane" >> "lane 0"
nit: why "reverse"?

================
Comment at: llvm/lib/Analysis/LoopAccessAnalysis.cpp:2610
+
+  bool canAnalyze() const { return FoundUDiv && !CannotAnalyze; }
+};
----------------
fhahn wrote:
> Ayal wrote:
> > nit: can return SCEVCouldNotCompute (or null) at the end if not FoundUDiv.
> Unfortunately that doesn't work without additional work, as the returned value may be used to construct the parent SCEV but the rewriter.
> 
ok, this is fine.

I see SCEVInitRewriter, SCEVPostIncRewriter, SCEVShiftRewriter indeed also record a similar SeenLoopVariantSCEVUnknown/Valid which their `rewrite()` queries after visit() to return SCEVCouldNotCompute. Worth doing the same, wrapping constructor/visit()/canAnalyze()?

================
Comment at: llvm/test/Transforms/LoopVectorize/X86/uniform_mem_op.ll:427
+; CHECK-NEXT:    [[TMP39]] = add <4 x i32> [[VEC_PHI6]], [[TMP35]]
+; CHECK-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], 16
+; CHECK-NEXT:    [[VEC_IND_NEXT]] = add <4 x i64> [[STEP_ADD2]], <i64 4, i64 4, i64 4, i64 4>
----------------
fhahn wrote:
> Ayal wrote:
> > note: VF=4 is mandated, previous UF=2 decision with 8 loads is now UF=4 with 4 (uniform) loads & broadcasts.
> > 
> > The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2.
> > Worth leaving behind some **assume**, if not folding directly? I.e., record the largest VF for which uniformity was detected, even if a smaller VF is selected.
> > Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2*VF? OTOH, if a value is known to be uniform for VF, check only two SCEVs for 2*VF?
> > The load from test_base[i/8] could further fold two 'parts' of VF=4 together, as it is uniform across VF=8 / VF=4,UF=2.
> 
> I think that would be good as follow-up.
> 
> > Worth optimizing the analysis across VF's, i.e., if a value is known not to be uniform for VF avoid checking for 2*VF
> 
> I was thinking about evaluating something like that as follow-up optimization. WDYT?
Sure, TODOs can be added to record potential follow-ups.

Note also that uniformity could be improved beyond comparing equal SCEV expressions, by using Divergence Analysis which propagates uniformity also through uniform branches.

================
Comment at: llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction1.ll:1
-; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
 ; RUN: opt -passes=loop-vectorize -force-vector-interleave=1 -force-vector-width=2 %s -S | FileCheck %s
----------------
line dropped intentionally?

================
Comment at: llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll:9
-; CHECK-SAME: (ptr noalias [[A:%.*]], ptr noalias [[B:%.*]]) {
-; ; VF2-LABEL: define void @ld_div1_step1_start0_ind2
 ; VF2-SAME: (ptr noalias [[A:%.*]], ptr noalias [[B:%.*]]) {
----------------
lines changed intentionally?

================
Comment at: llvm/test/Transforms/LoopVectorize/uniform_across_vf_induction2.ll:153
-; CHECK-LABEL: define void @ld_div2_step1_start0_ind2
-; CHECK-SAME: (ptr noalias [[A:%.*]], ptr noalias [[B:%.*]]) {
 ; VF2-LABEL: define void @ld_div2_step1_start0_ind2
----------------
lines dropped intentionally?

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D148841/new/

https://reviews.llvm.org/D148841