[PATCH] D98512: [LoopVectorize] Simplify scalar cost calculation in getInstructionCost

Mon Mar 15 03:25:40 PDT 2021

fhahn added a comment.

In D98512#2624996 <https://reviews.llvm.org/D98512#2624996>, @dmgreen wrote:

> I agree that the IV update in the test should have a cost of 1. (It seems strange it doesn't already, I suspect it isn't matched the same as an increment 1).
>
> Would it always be true for the addresses of scalarized loads/stores? They do need to calculate N addresses.

Hmm, but if there are uses of the IV in this case, don't we need to generate a scalar value for each element in the VF? E.g consider the example below

  define void @test(half* %b, i32 %n) {
  entry:
    br label %for.body

  for.body:
    %i = phi i32 [ 0, %entry ], [ %i.next, %for.body ]
    %tmp0 = getelementptr half, half* %b, i32 %i
    %lv = load half, half* %tmp0
    %add = fadd half %lv, 40.0
    store half %add, half* %tmp0, align 1
    %i.next = add nuw nsw i32 %i, 3
    %cond = icmp eq i32 %i.next, %n
    br i1 %cond, label %for.end, label %for.body

  for.end:
    ret void
  }

Generates the vector body below with ` -S -force-vector-width=4 -mtriple=arm64-apple-iphoneos`. Note `  %5 = add i32 %offset.idx, 0`, `%6 = add i32 %offset.idx, 3`,  `%7 = add i32 %offset.idx, 6` & ` %8 = add i32 %offset.idx, 9`. With this change, the cost of the induction increment is 1, but scalarization requires us to create 4 scalar increments. I don't think the cost of that is accounted elsewhere?

  vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ]
    %offset.idx = mul i32 %index, 3
    %5 = add i32 %offset.idx, 0
    %6 = add i32 %offset.idx, 3
    %7 = add i32 %offset.idx, 6
    %8 = add i32 %offset.idx, 9
    %9 = getelementptr half, half* %b, i32 %5
    %10 = getelementptr half, half* %b, i32 %6
    %11 = getelementptr half, half* %b, i32 %7
    %12 = getelementptr half, half* %b, i32 %8
    %13 = getelementptr half, half* %9, i32 0
    %14 = bitcast half* %13 to <12 x half>*
    %wide.vec = load <12 x half>, <12 x half>* %14, align 2
    %strided.vec = shufflevector <12 x half> %wide.vec, <12 x half> poison, <4 x i32> <i32 0, i32 3, i32 6, i32 9>
    %15 = fadd <4 x half> %strided.vec, <half 0xH5100, half 0xH5100, half 0xH5100, half 0xH5100>
    %16 = extractelement <4 x half> %15, i32 0
    store half %16, half* %9, align 1
    %17 = extractelement <4 x half> %15, i32 1
    store half %17, half* %10, align 1
    %18 = extractelement <4 x half> %15, i32 2
    store half %18, half* %11, align 1
    %19 = extractelement <4 x half> %15, i32 3
    store half %19, half* %12, align 1
    %index.next = add i32 %index, 4
    %20 = icmp eq i32 %index.next, %n.vec
    br i1 %20, label %middle.block, label %vector.body, !llvm.loop !0

================
Comment at: llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7377
     SmallVector<const Value *, 4> Operands(I->operand_values());
-    unsigned N = isScalarAfterVectorization(I, VF) ? VF.getKnownMinValue() : 1;
-    return N * TTI.getArithmeticInstrCost(
-                   I->getOpcode(), VectorTy, CostKind,
-                   TargetTransformInfo::OK_AnyValue,
-                   Op2VK, TargetTransformInfo::OP_None, Op2VP, Operands, I);
+    return TTI.getArithmeticInstrCost(
+        I->getOpcode(), VectorTy, CostKind, TargetTransformInfo::OK_AnyValue,
----------------
I think conceptually there's nothing preventing us from reaching this code with an instruction that needs to be scalarized if the VF > 1. It might not occur in practice for some/most backends at the moment, because most of the opcodes above can be widened on most backends and we do not really properly track which instructions will be scalarized at this point, except for pointer & induction variables.

If we make the code less general, I think we should have at least an assertion to make sure we never reach this code when VF > 1 && scalarization is requested, so we at least notice once it really matters and we need to be more general again.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D98512/new/

https://reviews.llvm.org/D98512