[PATCH] D115261: [LV] Disable runtime unrolling for vectorized loops.

Tue Jan 3 06:49:45 PST 2023

lebedev.ri accepted this revision.
lebedev.ri added a comment.

Hmm.

  LV: IC is 4
  LV: VF is 8
  LV: Interleaving to saturate store or load ports.
  LV: Minimum required TC for runtime checks to be profitable:0
  LV: Found a vectorizable loop (8) in <stdin>
  LV: Interleave Count is 4
  LEV: Epilogue vectorization is not profitable for this loop
  Executing best plan with VF=8, UF=4
  LV: vectorizing VPBB:vector.ph in BB:vector.ph
  LV: filled BB:
  vector.ph:                                        ; preds = %.lr.ph.preheader
    %n.mod.vf = urem i64 %3, 32
    %n.vec = sub i64 %3, %n.mod.vf
    br label %middle.block
  LV: VPBlock in RPO vector.body
  LV: created vector.body
  LV: draw edge fromvector.ph
  LV: vectorizing VPBB:vector.body in BB:vector.body
  LV: filled BB:
  vector.body:                                      ; preds = %vector.body, %vector.ph
    %index = phi i64 [ 0, %vector.ph ]
    %4 = add i64 %index, 0
    %5 = add i64 %index, 8
    %6 = add i64 %index, 16
    %7 = add i64 %index, 24
    %8 = getelementptr inbounds i32, ptr %0, i64 %4
    %9 = getelementptr inbounds i32, ptr %0, i64 %5
    %10 = getelementptr inbounds i32, ptr %0, i64 %6
    %11 = getelementptr inbounds i32, ptr %0, i64 %7
    %12 = getelementptr inbounds i32, ptr %8, i32 0
    %wide.load = load <8 x i32>, ptr %12, align 4, !tbaa !5
    %13 = getelementptr inbounds i32, ptr %8, i32 8
    %wide.load7 = load <8 x i32>, ptr %13, align 4, !tbaa !5
    %14 = getelementptr inbounds i32, ptr %8, i32 16
    %wide.load8 = load <8 x i32>, ptr %14, align 4, !tbaa !5
    %15 = getelementptr inbounds i32, ptr %8, i32 24
    %wide.load9 = load <8 x i32>, ptr %15, align 4, !tbaa !5
    %16 = mul nsw <8 x i32> %wide.load, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
    %17 = mul nsw <8 x i32> %wide.load7, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
    %18 = mul nsw <8 x i32> %wide.load8, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
    %19 = mul nsw <8 x i32> %wide.load9, <i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42, i32 42>
    %20 = getelementptr inbounds i32, ptr %8, i32 0
    store <8 x i32> %16, ptr %20, align 4, !tbaa !5
    %21 = getelementptr inbounds i32, ptr %8, i32 8
    store <8 x i32> %17, ptr %21, align 4, !tbaa !5
    %22 = getelementptr inbounds i32, ptr %8, i32 16
    store <8 x i32> %18, ptr %22, align 4, !tbaa !5
    %23 = getelementptr inbounds i32, ptr %8, i32 24
    store <8 x i32> %19, ptr %23, align 4, !tbaa !5
    %index.next = add nuw i64 %index, 32
    %24 = icmp eq i64 %index.next, %n.vec
    br i1 %24, <null operand!>, label %vector.body
  LV: vectorizing VPBB:middle.block in BB:middle.block

So i *was* thinking of something else.
It's possible that LV's unroll heuristic
may need further tuning, but in general
please proceed with this.

Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D115261/new/

https://reviews.llvm.org/D115261