[PATCH] D114373: [LV] Fix incorrectly marking a pointer indvar as 'scalar'.

Thu Nov 25 06:37:34 PST 2021

sdesmalen added a comment.

Thanks for the comments @david-arm!

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:4858
       bool IsUniform = Cost->isUniformAfterVectorization(P, State.VF);
       unsigned Lanes = IsUniform ? 1 : State.VF.getKnownMinValue();

----------------
david-arm wrote:
> If we don't expect to widen here anymore (i.e. you removed the `IsUniform && VF.isScalable()` stuff), can we also just assert(IsUniform) and only generate the first lane? If not, I think it's worth having an assert that VF is not scalable here because we're only caching the first known minimum number of lanes here.
> If we don't expect to widen here anymore (i.e. you removed the IsUniform && VF.isScalable() stuff), can we also just assert(IsUniform) and only generate the first lane?
I don't think so, because for fixed-width vectors this is still a valid code-path to go down. i.e. when it has chosen to scalarize the pointer-induction value used in a store (set to CM_Scalarize), which means it's ScalarAfterVectorization, but not Uniform.

>  If not, I think it's worth having an assert that VF is not scalable here because we're only caching the first known minimum number of lanes here.
For scalable vectors this code-path is still sensible, but only if the value is Uniform, so I can add `assert(IsUniform || !VF.isScalable())`.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll:64
+; CHECK-NEXT:    store <vscale x 2 x i8*> [[TMP14]], <vscale x 2 x i8*>* [[TMP16]], align 8
+; CHECK-NEXT:    [[TMP17:%.*]] = extractelement <vscale x 2 x i8*> [[TMP13]], i32 0
+; CHECK-NEXT:    [[TMP18:%.*]] = getelementptr i8, i8* [[TMP17]], i32 0
----------------
david-arm wrote:
> I think this may potentially lead to poorer code quality here with the lane extract, although that may be a price worth paying for overall correctness. I'm not certain if the extract will get folded with the vector GEP and transformed into a scalar GEP here. Maybe you could run `-instcombine` on this code to double check the output?
It doesn't get folded away by InstCombine because `[[TMP13]]` has more than one use (see D101900). I'm not sure if this is really a problem in practice though, because it's only concerning lane 0, which is cheap to extract. I found that the resulting assembly when removing the 'hasOneUse' restriction was the same.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-gep.ll:169
+; CHECK:       middle.block:
+; CHECK-NEXT:    [[CMP_N:%.*]] = icmp eq i64 [[TMP0]], [[N_VEC]]
+; CHECK-NEXT:    br i1 [[CMP_N]], label [[END:%.*]], label [[SCALAR_PH]]
----------------
david-arm wrote:
> Do we need all the check lines after this point, since we only really care about the vector loop?
Possibly not, but because this file had:
  ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
at the top, I figured to just update the check lines.

================
Comment at: llvm/test/Transforms/LoopVectorize/AArch64/sve-widen-phi.ll:137
+; CHECK-NEXT:    [[BC:%.*]] = bitcast <vscale x 2 x i32*> [[TMP7]] to <vscale x 2 x <vscale x 2 x i32>*>
+; CHECK-NEXT:    [[TMP8:%.*]] = extractelement <vscale x 2 x <vscale x 2 x i32>*> [[BC]], i32 0
+; CHECK-NEXT:    [[WIDE_LOAD:%.*]] = load <vscale x 2 x i32>, <vscale x 2 x i32>* [[TMP8]], align 8
----------------
david-arm wrote:
> Again, it's worth checking whether ot not these extracts get folded away?
The reason the extract is not folded away is because InstCombine::visitExtractElementInst doesn't look through bitcasts, because bitcasts may change the number of elements. Perhaps we can change that if the number of elements is unchanged, but then it will run into the `hasOneUse` restriction.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D114373/new/

https://reviews.llvm.org/D114373