[llvm-bugs] [Bug 44662] New: LoopAccessAnalysis's RuntimeMemoryCheckThreshold is just a bit too pessimistic?

Sat Jan 25 10:41:42 PST 2020

https://bugs.llvm.org/show_bug.cgi?id=44662

            Bug ID: 44662
           Summary: LoopAccessAnalysis's RuntimeMemoryCheckThreshold is
                    just a bit too pessimistic?
           Product: libraries
           Version: trunk
          Hardware: PC
                OS: Linux
            Status: NEW
          Severity: enhancement
          Priority: P
         Component: Global Analyses
          Assignee: unassignedbugs at nondot.org
          Reporter: lebedev.ri at gmail.com
                CC: llvm-bugs at lists.llvm.org

Right now runtime-memory-check-threshold defaults to 8
(since forever:
https://github.com/llvm/llvm-project/commit/1d862af7641eb88b4facce51fcf1b76c48c09d24)

Unfortunately it is just 1-shy from allowing vectorization in the following
wavelet recomposition code example, where we have two inputs, and one output:

LV: Checking a loop in
"_ZNK8rawspeed15VC5Decompressor7Wavelet15reconstructPassENS_10Array2DRefIsEENS2_IKsEES5_"
from src/librawspeed/decompressors/VC5Decompressor.cpp:208:7
LV: Loop hints: force=? width=0 unroll=0
LV: Found a loop: for.inc24
LV: Found an induction variable.
LV: We can vectorize this loop (with a runtime bound check)!
LV: Found trip count: 0
LV: The Smallest and Widest types: 16 / 16 bits.
LV: The Widest register safe to use is: 256 bits.
LV: Found uniform instruction:   %cmp18 = icmp ult i64 %indvars.iv.next196,
%34, !dbg !172
LV: Found uniform instruction:   %arrayidx.i.i.i.i83 = getelementptr inbounds
i16, i16* %arrayidx.i.i1.i.i.i, i64 %indvars.iv195, !dbg !145
LV: Found uniform instruction:   %arrayidx.i.i11.i.i.i.i = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg
!148
LV: Found uniform instruction:   %arrayidx.i.i11.1.i.i.i.i87 = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found uniform instruction:   %arrayidx.i.i11.2.i.i.i.i90 = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i,
!dbg !148
LV: Found uniform instruction:   %arrayidx.i25.i = getelementptr inbounds i16,
i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg !166
LV: Found uniform instruction:   %arrayidx.i.i100 = getelementptr inbounds i16,
i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg !169
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %indvars.iv195 = phi i64 [ 0, %for.inc24.lr.ph
], [ %indvars.iv.next196, %for.inc24 ]
LV: Found uniform instruction:   %indvars.iv.next196 = add nuw nsw i64
%indvars.iv195, 1, !dbg !171
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %indvars.iv195 = phi i64 [ 0, %for.inc24.lr.ph
], [ %indvars.iv.next196, %for.inc24 ]
LV: Found scalar instruction:   %indvars.iv.next196 = add nuw nsw i64
%indvars.iv195, 1, !dbg !171
LV: Found uniform instruction:   %cmp18 = icmp ult i64 %indvars.iv.next196,
%34, !dbg !172
LV: Found uniform instruction:   %arrayidx.i.i.i.i83 = getelementptr inbounds
i16, i16* %arrayidx.i.i1.i.i.i, i64 %indvars.iv195, !dbg !145
LV: Found uniform instruction:   %arrayidx.i.i11.i.i.i.i = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg
!148
LV: Found uniform instruction:   %arrayidx.i.i11.1.i.i.i.i87 = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found uniform instruction:   %arrayidx.i.i11.2.i.i.i.i90 = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i,
!dbg !148
LV: Found uniform instruction:   %arrayidx.i25.i = getelementptr inbounds i16,
i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg !166
LV: Found uniform instruction:   %arrayidx.i.i100 = getelementptr inbounds i16,
i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg !169
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %indvars.iv195 = phi i64 [ 0, %for.inc24.lr.ph
], [ %indvars.iv.next196, %for.inc24 ]
LV: Found uniform instruction:   %indvars.iv.next196 = add nuw nsw i64
%indvars.iv195, 1, !dbg !171
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %indvars.iv195 = phi i64 [ 0, %for.inc24.lr.ph
], [ %indvars.iv.next196, %for.inc24 ]
LV: Found scalar instruction:   %indvars.iv.next196 = add nuw nsw i64
%indvars.iv195, 1, !dbg !171
LV: Found uniform instruction:   %cmp18 = icmp ult i64 %indvars.iv.next196,
%34, !dbg !172
LV: Found uniform instruction:   %arrayidx.i.i.i.i83 = getelementptr inbounds
i16, i16* %arrayidx.i.i1.i.i.i, i64 %indvars.iv195, !dbg !145
LV: Found uniform instruction:   %arrayidx.i.i11.i.i.i.i = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg
!148
LV: Found uniform instruction:   %arrayidx.i.i11.1.i.i.i.i87 = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found uniform instruction:   %arrayidx.i.i11.2.i.i.i.i90 = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i,
!dbg !148
LV: Found uniform instruction:   %arrayidx.i25.i = getelementptr inbounds i16,
i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg !166
LV: Found uniform instruction:   %arrayidx.i.i100 = getelementptr inbounds i16,
i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg !169
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %indvars.iv195 = phi i64 [ 0, %for.inc24.lr.ph
], [ %indvars.iv.next196, %for.inc24 ]
LV: Found uniform instruction:   %indvars.iv.next196 = add nuw nsw i64
%indvars.iv195, 1, !dbg !171
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %indvars.iv195 = phi i64 [ 0, %for.inc24.lr.ph
], [ %indvars.iv.next196, %for.inc24 ]
LV: Found scalar instruction:   %indvars.iv.next196 = add nuw nsw i64
%indvars.iv195, 1, !dbg !171
LV: Found uniform instruction:   %cmp18 = icmp ult i64 %indvars.iv.next196,
%34, !dbg !172
LV: Found uniform instruction:   %arrayidx.i.i.i.i83 = getelementptr inbounds
i16, i16* %arrayidx.i.i1.i.i.i, i64 %indvars.iv195, !dbg !145
LV: Found uniform instruction:   %arrayidx.i.i11.i.i.i.i = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg
!148
LV: Found uniform instruction:   %arrayidx.i.i11.1.i.i.i.i87 = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found uniform instruction:   %arrayidx.i.i11.2.i.i.i.i90 = getelementptr
inbounds i16, i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i,
!dbg !148
LV: Found uniform instruction:   %arrayidx.i25.i = getelementptr inbounds i16,
i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg !166
LV: Found uniform instruction:   %arrayidx.i.i100 = getelementptr inbounds i16,
i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg !169
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found uniform instruction:   %indvars.iv195 = phi i64 [ 0, %for.inc24.lr.ph
], [ %indvars.iv.next196, %for.inc24 ]
LV: Found uniform instruction:   %indvars.iv.next196 = add nuw nsw i64
%indvars.iv195, 1, !dbg !171
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %arrayidx.i.i.i.i.i.i.i85 = getelementptr
inbounds i16, i16* %process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found scalar instruction:   %indvars.iv195 = phi i64 [ 0, %for.inc24.lr.ph
], [ %indvars.iv.next196, %for.inc24 ]
LV: Found scalar instruction:   %indvars.iv.next196 = add nuw nsw i64
%indvars.iv195, 1, !dbg !171
LV: Scalarizing:  %arrayidx.i.i.i.i83 = getelementptr inbounds i16, i16*
%arrayidx.i.i1.i.i.i, i64 %indvars.iv195, !dbg !145
LV: Scalarizing:  %35 = load i16, i16* %arrayidx.i.i.i.i83, align 2, !dbg !146,
!tbaa !61
LV: Scalarizing:  %conv.i.i.i84 = sext i16 %35 to i32, !dbg !146
LV: Scalarizing:  %arrayidx.i.i.i.i.i.i.i85 = getelementptr inbounds i16, i16*
%process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Scalarizing:  %arrayidx.i.i11.i.i.i.i = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg !148
LV: Scalarizing:  %36 = load i16, i16* %arrayidx.i.i11.i.i.i.i, align 2, !dbg
!149, !tbaa !61
LV: Scalarizing:  %conv3.i.i.i.i86 = sext i16 %36 to i32, !dbg !150
LV: Scalarizing:  %arrayidx.i.i11.1.i.i.i.i87 = getelementptr inbounds i16,
i16* %arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Scalarizing:  %37 = load i16, i16* %arrayidx.i.i11.1.i.i.i.i87, align 2,
!dbg !149, !tbaa !61
LV: Scalarizing:  %conv3.1.i.i.i.i88 = sext i16 %37 to i32, !dbg !150
LV: Scalarizing:  %mul.1.i.i.i.i89 = shl nsw i32 %conv3.1.i.i.i.i88, 3, !dbg
!151
LV: Scalarizing:  %arrayidx.i.i11.2.i.i.i.i90 = getelementptr inbounds i16,
i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i, !dbg !148
LV: Scalarizing:  %38 = load i16, i16* %arrayidx.i.i11.2.i.i.i.i90, align 2,
!dbg !149, !tbaa !61
LV: Scalarizing:  %conv3.2.i.i.i.i91 = sext i16 %38 to i32, !dbg !150
LV: Scalarizing:  %add4.1.i.i.i.i92 = add nsw i32 %conv3.i.i.i.i86, 4, !dbg
!152
LV: Scalarizing:  %add4.2.i.i.i.i93 = add nsw i32 %add4.1.i.i.i.i92,
%mul.1.i.i.i.i89, !dbg !152
LV: Scalarizing:  %add.i.i.i94 = sub nsw i32 %add4.2.i.i.i.i93,
%conv3.2.i.i.i.i91, !dbg !153
LV: Scalarizing:  %39 = lshr i32 %add.i.i.i94, 3, !dbg !154
LV: Scalarizing:  %add4.i.i.i95 = add nsw i32 %39, %conv.i.i.i84, !dbg !155
LV: Scalarizing:  %40 = lshr i32 %add4.i.i.i95, 1, !dbg !156
LV: Scalarizing:  %add4.1.i.i.i62.i = sub nsw i32 4, %conv3.i.i.i.i86, !dbg
!157
LV: Scalarizing:  %add4.2.i.i.i63.i = add nsw i32 %add4.1.i.i.i62.i,
%mul.1.i.i.i.i89, !dbg !157
LV: Scalarizing:  %add.i.i64.i = add nsw i32 %add4.2.i.i.i63.i,
%conv3.2.i.i.i.i91, !dbg !161
LV: Scalarizing:  %41 = lshr i32 %add.i.i64.i, 3, !dbg !162
LV: Scalarizing:  %add4.i.i66.i = sub nsw i32 %41, %conv.i.i.i84, !dbg !163
LV: Scalarizing:  %42 = lshr i32 %add4.i.i66.i, 1, !dbg !164
LV: Scalarizing:  %conv.i96 = trunc i32 %40 to i16, !dbg !165
LV: Scalarizing:  %arrayidx.i25.i = getelementptr inbounds i16, i16*
%arrayidx.i.i23.i, i64 %indvars.iv195, !dbg !166
LV: Scalarizing:  store i16 %conv.i96, i16* %arrayidx.i25.i, align 2, !dbg
!167, !tbaa !61
LV: Scalarizing:  %conv5.i97 = trunc i32 %42 to i16, !dbg !168
LV: Scalarizing:  %arrayidx.i.i100 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i99, i64 %indvars.iv195, !dbg !169
LV: Scalarizing:  store i16 %conv5.i97, i16* %arrayidx.i.i100, align 2, !dbg
!170, !tbaa !61
LV: Scalarizing:  %arrayidx.i.i.i.i83 = getelementptr inbounds i16, i16*
%arrayidx.i.i1.i.i.i, i64 %indvars.iv195, !dbg !145
LV: Scalarizing:  %arrayidx.i.i.i.i.i.i.i85 = getelementptr inbounds i16, i16*
%process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Scalarizing:  %arrayidx.i.i11.i.i.i.i = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg !148
LV: Scalarizing:  %arrayidx.i.i11.1.i.i.i.i87 = getelementptr inbounds i16,
i16* %arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Scalarizing:  %arrayidx.i.i11.2.i.i.i.i90 = getelementptr inbounds i16,
i16* %arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i, !dbg !148
LV: Scalarizing:  %arrayidx.i25.i = getelementptr inbounds i16, i16*
%arrayidx.i.i23.i, i64 %indvars.iv195, !dbg !166
LV: Scalarizing:  %arrayidx.i.i100 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i99, i64 %indvars.iv195, !dbg !169
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for
VF=\{1\},UF\>=1"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
  N0 [label =
    "for.inc24:\n" +
      "WIDEN-INDUCTION %indvars.iv195 = phi 0, %indvars.iv.next196\l" +
      "CLONE %arrayidx.i.i.i.i83 = getelementptr %arrayidx.i.i1.i.i.i,
%indvars.iv195\l" +
      "CLONE %35 = load %arrayidx.i.i.i.i83\l" +
      "CLONE %conv.i.i.i84 = sext %35\l" +
      "CLONE %arrayidx.i.i.i.i.i.i.i85 = getelementptr
%process.sroa.6148.0.copyload, %indvars.iv195\l" +
      "CLONE %arrayidx.i.i11.i.i.i.i = getelementptr %arrayidx.i.i.i.i.i.i.i85,
%idxprom.i.i.i.i.i.i.i\l" +
      "CLONE %36 = load %arrayidx.i.i11.i.i.i.i\l" +
      "CLONE %conv3.i.i.i.i86 = sext %36\l" +
      "CLONE %arrayidx.i.i11.1.i.i.i.i87 = getelementptr
%arrayidx.i.i.i.i.i.i.i85, %33\l" +
      "CLONE %37 = load %arrayidx.i.i11.1.i.i.i.i87\l" +
      "CLONE %conv3.1.i.i.i.i88 = sext %37\l" +
      "CLONE %mul.1.i.i.i.i89 = shl %conv3.1.i.i.i.i88, 3\l" +
      "CLONE %arrayidx.i.i11.2.i.i.i.i90 = getelementptr
%arrayidx.i.i.i.i.i.i.i85, %idxprom.i.i.i.2.i.i.i.i\l" +
      "CLONE %38 = load %arrayidx.i.i11.2.i.i.i.i90\l" +
      "CLONE %conv3.2.i.i.i.i91 = sext %38\l" +
      "CLONE %add4.1.i.i.i.i92 = add %conv3.i.i.i.i86, 4\l" +
      "CLONE %add4.2.i.i.i.i93 = add %add4.1.i.i.i.i92, %mul.1.i.i.i.i89\l" +
      "CLONE %add.i.i.i94 = sub %add4.2.i.i.i.i93, %conv3.2.i.i.i.i91\l" +
      "CLONE %39 = lshr %add.i.i.i94, 3\l" +
      "CLONE %add4.i.i.i95 = add %39, %conv.i.i.i84\l" +
      "CLONE %40 = lshr %add4.i.i.i95, 1\l" +
      "CLONE %add4.1.i.i.i62.i = sub 4, %conv3.i.i.i.i86\l" +
      "CLONE %add4.2.i.i.i63.i = add %add4.1.i.i.i62.i, %mul.1.i.i.i.i89\l" +
      "CLONE %add.i.i64.i = add %add4.2.i.i.i63.i, %conv3.2.i.i.i.i91\l" +
      "CLONE %41 = lshr %add.i.i64.i, 3\l" +
      "CLONE %add4.i.i66.i = sub %41, %conv.i.i.i84\l" +
      "CLONE %42 = lshr %add4.i.i66.i, 1\l" +
      "CLONE %conv.i96 = trunc %40\l" +
      "CLONE %arrayidx.i25.i = getelementptr %arrayidx.i.i23.i,
%indvars.iv195\l" +
      "CLONE store %conv.i96, %arrayidx.i25.i\l" +
      "CLONE %conv5.i97 = trunc %42\l" +
      "CLONE %arrayidx.i.i100 = getelementptr %arrayidx.i.i.i99,
%indvars.iv195\l" +
      "CLONE store %conv5.i97, %arrayidx.i.i100\l"
  ]
}
digraph VPlan {
graph [labelloc=t, fontsize=30; label="Vectorization Plan\nInitial VPlan for
VF=\{2,4,8,16\},UF\>=1, where:\n%vp33344 := %arrayidx.i.i11.i.i.i.i\n%vp25520
:= %arrayidx.i.i.i.i83\n%vp39584 := %arrayidx.i.i11.2.i.i.i.i90\n%vp53952 :=
%arrayidx.i25.i\n%vp41440 := %arrayidx.i.i11.1.i.i.i.i87\n%vp52144 :=
%arrayidx.i.i100"]
node [shape=rect, fontname=Courier, fontsize=30]
edge [fontname=Courier, fontsize=30]
compound=true
  N0 [label =
    "for.inc24:\n" +
      "WIDEN-INDUCTION %indvars.iv195 = phi 0, %indvars.iv.next196\l" +
      "CLONE %arrayidx.i.i.i.i83 = getelementptr %arrayidx.i.i1.i.i.i,
%indvars.iv195\l" +
      "WIDEN %35 = load %arrayidx.i.i.i.i83, %vp25520\l" +
      "WIDEN\l" +
      "  %conv.i.i.i84 = sext %35\l" +
      "CLONE %arrayidx.i.i.i.i.i.i.i85 = getelementptr
%process.sroa.6148.0.copyload, %indvars.iv195\l" +
      "CLONE %arrayidx.i.i11.i.i.i.i = getelementptr %arrayidx.i.i.i.i.i.i.i85,
%idxprom.i.i.i.i.i.i.i\l" +
      "WIDEN %36 = load %arrayidx.i.i11.i.i.i.i, %vp33344\l" +
      "WIDEN\l" +
      "  %conv3.i.i.i.i86 = sext %36\l" +
      "CLONE %arrayidx.i.i11.1.i.i.i.i87 = getelementptr
%arrayidx.i.i.i.i.i.i.i85, %33\l" +
      "WIDEN %37 = load %arrayidx.i.i11.1.i.i.i.i87, %vp41440\l" +
      "WIDEN\l" +
      "  %conv3.1.i.i.i.i88 = sext %37\l" +
      "  %mul.1.i.i.i.i89 = shl %conv3.1.i.i.i.i88, 3\l" +
      "CLONE %arrayidx.i.i11.2.i.i.i.i90 = getelementptr
%arrayidx.i.i.i.i.i.i.i85, %idxprom.i.i.i.2.i.i.i.i\l" +
      "WIDEN %38 = load %arrayidx.i.i11.2.i.i.i.i90, %vp39584\l" +
      "WIDEN\l" +
      "  %conv3.2.i.i.i.i91 = sext %38\l" +
      "  %add4.1.i.i.i.i92 = add %conv3.i.i.i.i86, 4\l" +
      "  %add4.2.i.i.i.i93 = add %add4.1.i.i.i.i92, %mul.1.i.i.i.i89\l" +
      "  %add.i.i.i94 = sub %add4.2.i.i.i.i93, %conv3.2.i.i.i.i91\l" +
      "  %39 = lshr %add.i.i.i94, 3\l" +
      "  %add4.i.i.i95 = add %39, %conv.i.i.i84\l" +
      "  %40 = lshr %add4.i.i.i95, 1\l" +
      "  %add4.1.i.i.i62.i = sub 4, %conv3.i.i.i.i86\l" +
      "  %add4.2.i.i.i63.i = add %add4.1.i.i.i62.i, %mul.1.i.i.i.i89\l" +
      "  %add.i.i64.i = add %add4.2.i.i.i63.i, %conv3.2.i.i.i.i91\l" +
      "  %41 = lshr %add.i.i64.i, 3\l" +
      "  %add4.i.i66.i = sub %41, %conv.i.i.i84\l" +
      "  %42 = lshr %add4.i.i66.i, 1\l" +
      "  %conv.i96 = trunc %40\l" +
      "CLONE %arrayidx.i25.i = getelementptr %arrayidx.i.i23.i,
%indvars.iv195\l" +
      "WIDEN store %conv.i96, %arrayidx.i25.i, %vp53952\l" +
      "WIDEN\l" +
      "  %conv5.i97 = trunc %42\l" +
      "CLONE %arrayidx.i.i100 = getelementptr %arrayidx.i.i.i99,
%indvars.iv195\l" +
      "WIDEN store %conv5.i97, %arrayidx.i.i100, %vp52144\l"
  ]
}
LV: Found an estimated cost of 0 for VF 1 For instruction:   %indvars.iv195 =
phi i64 [ 0, %for.inc24.lr.ph ], [ %indvars.iv.next196, %for.inc24 ]
LV: Found an estimated cost of 0 for VF 1 For instruction:  
%arrayidx.i.i.i.i83 = getelementptr inbounds i16, i16* %arrayidx.i.i1.i.i.i,
i64 %indvars.iv195, !dbg !145
LV: Found an estimated cost of 1 for VF 1 For instruction:   %35 = load i16,
i16* %arrayidx.i.i.i.i83, align 2, !dbg !146, !tbaa !61
LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv.i.i.i84 =
sext i16 %35 to i32, !dbg !146
LV: Found an estimated cost of 0 for VF 1 For instruction:  
%arrayidx.i.i.i.i.i.i.i85 = getelementptr inbounds i16, i16*
%process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found an estimated cost of 0 for VF 1 For instruction:  
%arrayidx.i.i11.i.i.i.i = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 1 For instruction:   %36 = load i16,
i16* %arrayidx.i.i11.i.i.i.i, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv3.i.i.i.i86 =
sext i16 %36 to i32, !dbg !150
LV: Found an estimated cost of 0 for VF 1 For instruction:  
%arrayidx.i.i11.1.i.i.i.i87 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found an estimated cost of 1 for VF 1 For instruction:   %37 = load i16,
i16* %arrayidx.i.i11.1.i.i.i.i87, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv3.1.i.i.i.i88
= sext i16 %37 to i32, !dbg !150
LV: Found an estimated cost of 1 for VF 1 For instruction:   %mul.1.i.i.i.i89 =
shl nsw i32 %conv3.1.i.i.i.i88, 3, !dbg !151
LV: Found an estimated cost of 0 for VF 1 For instruction:  
%arrayidx.i.i11.2.i.i.i.i90 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 1 For instruction:   %38 = load i16,
i16* %arrayidx.i.i11.2.i.i.i.i90, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv3.2.i.i.i.i91
= sext i16 %38 to i32, !dbg !150
LV: Found an estimated cost of 1 for VF 1 For instruction:   %add4.1.i.i.i.i92
= add nsw i32 %conv3.i.i.i.i86, 4, !dbg !152
LV: Found an estimated cost of 1 for VF 1 For instruction:   %add4.2.i.i.i.i93
= add nsw i32 %add4.1.i.i.i.i92, %mul.1.i.i.i.i89, !dbg !152
LV: Found an estimated cost of 1 for VF 1 For instruction:   %add.i.i.i94 = sub
nsw i32 %add4.2.i.i.i.i93, %conv3.2.i.i.i.i91, !dbg !153
LV: Found an estimated cost of 1 for VF 1 For instruction:   %39 = lshr i32
%add.i.i.i94, 3, !dbg !154
LV: Found an estimated cost of 1 for VF 1 For instruction:   %add4.i.i.i95 =
add nsw i32 %39, %conv.i.i.i84, !dbg !155
LV: Found an estimated cost of 1 for VF 1 For instruction:   %40 = lshr i32
%add4.i.i.i95, 1, !dbg !156
LV: Found an estimated cost of 1 for VF 1 For instruction:   %add4.1.i.i.i62.i
= sub nsw i32 4, %conv3.i.i.i.i86, !dbg !157
LV: Found an estimated cost of 1 for VF 1 For instruction:   %add4.2.i.i.i63.i
= add nsw i32 %add4.1.i.i.i62.i, %mul.1.i.i.i.i89, !dbg !157
LV: Found an estimated cost of 1 for VF 1 For instruction:   %add.i.i64.i = add
nsw i32 %add4.2.i.i.i63.i, %conv3.2.i.i.i.i91, !dbg !161
LV: Found an estimated cost of 1 for VF 1 For instruction:   %41 = lshr i32
%add.i.i64.i, 3, !dbg !162
LV: Found an estimated cost of 1 for VF 1 For instruction:   %add4.i.i66.i =
sub nsw i32 %41, %conv.i.i.i84, !dbg !163
LV: Found an estimated cost of 1 for VF 1 For instruction:   %42 = lshr i32
%add4.i.i66.i, 1, !dbg !164
LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv.i96 = trunc
i32 %40 to i16, !dbg !165
LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx.i25.i =
getelementptr inbounds i16, i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg
!166
LV: Found an estimated cost of 1 for VF 1 For instruction:   store i16
%conv.i96, i16* %arrayidx.i25.i, align 2, !dbg !167, !tbaa !61
LV: Found an estimated cost of 0 for VF 1 For instruction:   %conv5.i97 = trunc
i32 %42 to i16, !dbg !168
LV: Found an estimated cost of 0 for VF 1 For instruction:   %arrayidx.i.i100 =
getelementptr inbounds i16, i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg
!169
LV: Found an estimated cost of 1 for VF 1 For instruction:   store i16
%conv5.i97, i16* %arrayidx.i.i100, align 2, !dbg !170, !tbaa !61
LV: Found an estimated cost of 1 for VF 1 For instruction:  
%indvars.iv.next196 = add nuw nsw i64 %indvars.iv195, 1, !dbg !171
LV: Found an estimated cost of 1 for VF 1 For instruction:   %cmp18 = icmp ult
i64 %indvars.iv.next196, %34, !dbg !172
LV: Found an estimated cost of 0 for VF 1 For instruction:   br i1 %cmp18,
label %for.inc24, label %omp.inner.for.inc.loopexit207, !dbg !120, !llvm.loop
!173
LV: Scalar loop costs: 21.
LV: Found an estimated cost of 0 for VF 2 For instruction:   %indvars.iv195 =
phi i64 [ 0, %for.inc24.lr.ph ], [ %indvars.iv.next196, %for.inc24 ]
LV: Found an estimated cost of 0 for VF 2 For instruction:  
%arrayidx.i.i.i.i83 = getelementptr inbounds i16, i16* %arrayidx.i.i1.i.i.i,
i64 %indvars.iv195, !dbg !145
LV: Found an estimated cost of 1 for VF 2 For instruction:   %35 = load i16,
i16* %arrayidx.i.i.i.i83, align 2, !dbg !146, !tbaa !61
LV: Found an estimated cost of 2 for VF 2 For instruction:   %conv.i.i.i84 =
sext i16 %35 to i32, !dbg !146
LV: Found an estimated cost of 0 for VF 2 For instruction:  
%arrayidx.i.i.i.i.i.i.i85 = getelementptr inbounds i16, i16*
%process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found an estimated cost of 0 for VF 2 For instruction:  
%arrayidx.i.i11.i.i.i.i = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 2 For instruction:   %36 = load i16,
i16* %arrayidx.i.i11.i.i.i.i, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 2 for VF 2 For instruction:   %conv3.i.i.i.i86 =
sext i16 %36 to i32, !dbg !150
LV: Found an estimated cost of 0 for VF 2 For instruction:  
%arrayidx.i.i11.1.i.i.i.i87 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found an estimated cost of 1 for VF 2 For instruction:   %37 = load i16,
i16* %arrayidx.i.i11.1.i.i.i.i87, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 2 for VF 2 For instruction:   %conv3.1.i.i.i.i88
= sext i16 %37 to i32, !dbg !150
LV: Found an estimated cost of 1 for VF 2 For instruction:   %mul.1.i.i.i.i89 =
shl nsw i32 %conv3.1.i.i.i.i88, 3, !dbg !151
LV: Found an estimated cost of 0 for VF 2 For instruction:  
%arrayidx.i.i11.2.i.i.i.i90 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 2 For instruction:   %38 = load i16,
i16* %arrayidx.i.i11.2.i.i.i.i90, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 2 for VF 2 For instruction:   %conv3.2.i.i.i.i91
= sext i16 %38 to i32, !dbg !150
LV: Found an estimated cost of 1 for VF 2 For instruction:   %add4.1.i.i.i.i92
= add nsw i32 %conv3.i.i.i.i86, 4, !dbg !152
LV: Found an estimated cost of 1 for VF 2 For instruction:   %add4.2.i.i.i.i93
= add nsw i32 %add4.1.i.i.i.i92, %mul.1.i.i.i.i89, !dbg !152
LV: Found an estimated cost of 1 for VF 2 For instruction:   %add.i.i.i94 = sub
nsw i32 %add4.2.i.i.i.i93, %conv3.2.i.i.i.i91, !dbg !153
LV: Found an estimated cost of 1 for VF 2 For instruction:   %39 = lshr i32
%add.i.i.i94, 3, !dbg !154
LV: Found an estimated cost of 1 for VF 2 For instruction:   %add4.i.i.i95 =
add nsw i32 %39, %conv.i.i.i84, !dbg !155
LV: Found an estimated cost of 1 for VF 2 For instruction:   %40 = lshr i32
%add4.i.i.i95, 1, !dbg !156
LV: Found an estimated cost of 1 for VF 2 For instruction:   %add4.1.i.i.i62.i
= sub nsw i32 4, %conv3.i.i.i.i86, !dbg !157
LV: Found an estimated cost of 1 for VF 2 For instruction:   %add4.2.i.i.i63.i
= add nsw i32 %add4.1.i.i.i62.i, %mul.1.i.i.i.i89, !dbg !157
LV: Found an estimated cost of 1 for VF 2 For instruction:   %add.i.i64.i = add
nsw i32 %add4.2.i.i.i63.i, %conv3.2.i.i.i.i91, !dbg !161
LV: Found an estimated cost of 1 for VF 2 For instruction:   %41 = lshr i32
%add.i.i64.i, 3, !dbg !162
LV: Found an estimated cost of 1 for VF 2 For instruction:   %add4.i.i66.i =
sub nsw i32 %41, %conv.i.i.i84, !dbg !163
LV: Found an estimated cost of 1 for VF 2 For instruction:   %42 = lshr i32
%add4.i.i66.i, 1, !dbg !164
LV: Found an estimated cost of 1 for VF 2 For instruction:   %conv.i96 = trunc
i32 %40 to i16, !dbg !165
LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx.i25.i =
getelementptr inbounds i16, i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg
!166
LV: Found an estimated cost of 1 for VF 2 For instruction:   store i16
%conv.i96, i16* %arrayidx.i25.i, align 2, !dbg !167, !tbaa !61
LV: Found an estimated cost of 1 for VF 2 For instruction:   %conv5.i97 = trunc
i32 %42 to i16, !dbg !168
LV: Found an estimated cost of 0 for VF 2 For instruction:   %arrayidx.i.i100 =
getelementptr inbounds i16, i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg
!169
LV: Found an estimated cost of 1 for VF 2 For instruction:   store i16
%conv5.i97, i16* %arrayidx.i.i100, align 2, !dbg !170, !tbaa !61
LV: Found an estimated cost of 1 for VF 2 For instruction:  
%indvars.iv.next196 = add nuw nsw i64 %indvars.iv195, 1, !dbg !171
LV: Found an estimated cost of 1 for VF 2 For instruction:   %cmp18 = icmp ult
i64 %indvars.iv.next196, %34, !dbg !172
LV: Found an estimated cost of 0 for VF 2 For instruction:   br i1 %cmp18,
label %for.inc24, label %omp.inner.for.inc.loopexit207, !dbg !120, !llvm.loop
!173
LV: Vector loop of width 2 costs: 15.
LV: Found an estimated cost of 0 for VF 4 For instruction:   %indvars.iv195 =
phi i64 [ 0, %for.inc24.lr.ph ], [ %indvars.iv.next196, %for.inc24 ]
LV: Found an estimated cost of 0 for VF 4 For instruction:  
%arrayidx.i.i.i.i83 = getelementptr inbounds i16, i16* %arrayidx.i.i1.i.i.i,
i64 %indvars.iv195, !dbg !145
LV: Found an estimated cost of 1 for VF 4 For instruction:   %35 = load i16,
i16* %arrayidx.i.i.i.i83, align 2, !dbg !146, !tbaa !61
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv.i.i.i84 =
sext i16 %35 to i32, !dbg !146
LV: Found an estimated cost of 0 for VF 4 For instruction:  
%arrayidx.i.i.i.i.i.i.i85 = getelementptr inbounds i16, i16*
%process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found an estimated cost of 0 for VF 4 For instruction:  
%arrayidx.i.i11.i.i.i.i = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 4 For instruction:   %36 = load i16,
i16* %arrayidx.i.i11.i.i.i.i, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv3.i.i.i.i86 =
sext i16 %36 to i32, !dbg !150
LV: Found an estimated cost of 0 for VF 4 For instruction:  
%arrayidx.i.i11.1.i.i.i.i87 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found an estimated cost of 1 for VF 4 For instruction:   %37 = load i16,
i16* %arrayidx.i.i11.1.i.i.i.i87, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv3.1.i.i.i.i88
= sext i16 %37 to i32, !dbg !150
LV: Found an estimated cost of 1 for VF 4 For instruction:   %mul.1.i.i.i.i89 =
shl nsw i32 %conv3.1.i.i.i.i88, 3, !dbg !151
LV: Found an estimated cost of 0 for VF 4 For instruction:  
%arrayidx.i.i11.2.i.i.i.i90 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 4 For instruction:   %38 = load i16,
i16* %arrayidx.i.i11.2.i.i.i.i90, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv3.2.i.i.i.i91
= sext i16 %38 to i32, !dbg !150
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add4.1.i.i.i.i92
= add nsw i32 %conv3.i.i.i.i86, 4, !dbg !152
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add4.2.i.i.i.i93
= add nsw i32 %add4.1.i.i.i.i92, %mul.1.i.i.i.i89, !dbg !152
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add.i.i.i94 = sub
nsw i32 %add4.2.i.i.i.i93, %conv3.2.i.i.i.i91, !dbg !153
LV: Found an estimated cost of 1 for VF 4 For instruction:   %39 = lshr i32
%add.i.i.i94, 3, !dbg !154
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add4.i.i.i95 =
add nsw i32 %39, %conv.i.i.i84, !dbg !155
LV: Found an estimated cost of 1 for VF 4 For instruction:   %40 = lshr i32
%add4.i.i.i95, 1, !dbg !156
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add4.1.i.i.i62.i
= sub nsw i32 4, %conv3.i.i.i.i86, !dbg !157
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add4.2.i.i.i63.i
= add nsw i32 %add4.1.i.i.i62.i, %mul.1.i.i.i.i89, !dbg !157
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add.i.i64.i = add
nsw i32 %add4.2.i.i.i63.i, %conv3.2.i.i.i.i91, !dbg !161
LV: Found an estimated cost of 1 for VF 4 For instruction:   %41 = lshr i32
%add.i.i64.i, 3, !dbg !162
LV: Found an estimated cost of 1 for VF 4 For instruction:   %add4.i.i66.i =
sub nsw i32 %41, %conv.i.i.i84, !dbg !163
LV: Found an estimated cost of 1 for VF 4 For instruction:   %42 = lshr i32
%add4.i.i66.i, 1, !dbg !164
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv.i96 = trunc
i32 %40 to i16, !dbg !165
LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx.i25.i =
getelementptr inbounds i16, i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg
!166
LV: Found an estimated cost of 1 for VF 4 For instruction:   store i16
%conv.i96, i16* %arrayidx.i25.i, align 2, !dbg !167, !tbaa !61
LV: Found an estimated cost of 1 for VF 4 For instruction:   %conv5.i97 = trunc
i32 %42 to i16, !dbg !168
LV: Found an estimated cost of 0 for VF 4 For instruction:   %arrayidx.i.i100 =
getelementptr inbounds i16, i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg
!169
LV: Found an estimated cost of 1 for VF 4 For instruction:   store i16
%conv5.i97, i16* %arrayidx.i.i100, align 2, !dbg !170, !tbaa !61
LV: Found an estimated cost of 1 for VF 4 For instruction:  
%indvars.iv.next196 = add nuw nsw i64 %indvars.iv195, 1, !dbg !171
LV: Found an estimated cost of 1 for VF 4 For instruction:   %cmp18 = icmp ult
i64 %indvars.iv.next196, %34, !dbg !172
LV: Found an estimated cost of 0 for VF 4 For instruction:   br i1 %cmp18,
label %for.inc24, label %omp.inner.for.inc.loopexit207, !dbg !120, !llvm.loop
!173
LV: Vector loop of width 4 costs: 6.
LV: Found an estimated cost of 0 for VF 8 For instruction:   %indvars.iv195 =
phi i64 [ 0, %for.inc24.lr.ph ], [ %indvars.iv.next196, %for.inc24 ]
LV: Found an estimated cost of 0 for VF 8 For instruction:  
%arrayidx.i.i.i.i83 = getelementptr inbounds i16, i16* %arrayidx.i.i1.i.i.i,
i64 %indvars.iv195, !dbg !145
LV: Found an estimated cost of 1 for VF 8 For instruction:   %35 = load i16,
i16* %arrayidx.i.i.i.i83, align 2, !dbg !146, !tbaa !61
LV: Found an estimated cost of 4 for VF 8 For instruction:   %conv.i.i.i84 =
sext i16 %35 to i32, !dbg !146
LV: Found an estimated cost of 0 for VF 8 For instruction:  
%arrayidx.i.i.i.i.i.i.i85 = getelementptr inbounds i16, i16*
%process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found an estimated cost of 0 for VF 8 For instruction:  
%arrayidx.i.i11.i.i.i.i = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 8 For instruction:   %36 = load i16,
i16* %arrayidx.i.i11.i.i.i.i, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 4 for VF 8 For instruction:   %conv3.i.i.i.i86 =
sext i16 %36 to i32, !dbg !150
LV: Found an estimated cost of 0 for VF 8 For instruction:  
%arrayidx.i.i11.1.i.i.i.i87 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found an estimated cost of 1 for VF 8 For instruction:   %37 = load i16,
i16* %arrayidx.i.i11.1.i.i.i.i87, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 4 for VF 8 For instruction:   %conv3.1.i.i.i.i88
= sext i16 %37 to i32, !dbg !150
LV: Found an estimated cost of 4 for VF 8 For instruction:   %mul.1.i.i.i.i89 =
shl nsw i32 %conv3.1.i.i.i.i88, 3, !dbg !151
LV: Found an estimated cost of 0 for VF 8 For instruction:  
%arrayidx.i.i11.2.i.i.i.i90 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 8 For instruction:   %38 = load i16,
i16* %arrayidx.i.i11.2.i.i.i.i90, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 4 for VF 8 For instruction:   %conv3.2.i.i.i.i91
= sext i16 %38 to i32, !dbg !150
LV: Found an estimated cost of 4 for VF 8 For instruction:   %add4.1.i.i.i.i92
= add nsw i32 %conv3.i.i.i.i86, 4, !dbg !152
LV: Found an estimated cost of 4 for VF 8 For instruction:   %add4.2.i.i.i.i93
= add nsw i32 %add4.1.i.i.i.i92, %mul.1.i.i.i.i89, !dbg !152
LV: Found an estimated cost of 4 for VF 8 For instruction:   %add.i.i.i94 = sub
nsw i32 %add4.2.i.i.i.i93, %conv3.2.i.i.i.i91, !dbg !153
LV: Found an estimated cost of 4 for VF 8 For instruction:   %39 = lshr i32
%add.i.i.i94, 3, !dbg !154
LV: Found an estimated cost of 4 for VF 8 For instruction:   %add4.i.i.i95 =
add nsw i32 %39, %conv.i.i.i84, !dbg !155
LV: Found an estimated cost of 4 for VF 8 For instruction:   %40 = lshr i32
%add4.i.i.i95, 1, !dbg !156
LV: Found an estimated cost of 4 for VF 8 For instruction:   %add4.1.i.i.i62.i
= sub nsw i32 4, %conv3.i.i.i.i86, !dbg !157
LV: Found an estimated cost of 4 for VF 8 For instruction:   %add4.2.i.i.i63.i
= add nsw i32 %add4.1.i.i.i62.i, %mul.1.i.i.i.i89, !dbg !157
LV: Found an estimated cost of 4 for VF 8 For instruction:   %add.i.i64.i = add
nsw i32 %add4.2.i.i.i63.i, %conv3.2.i.i.i.i91, !dbg !161
LV: Found an estimated cost of 4 for VF 8 For instruction:   %41 = lshr i32
%add.i.i64.i, 3, !dbg !162
LV: Found an estimated cost of 4 for VF 8 For instruction:   %add4.i.i66.i =
sub nsw i32 %41, %conv.i.i.i84, !dbg !163
LV: Found an estimated cost of 4 for VF 8 For instruction:   %42 = lshr i32
%add4.i.i66.i, 1, !dbg !164
LV: Found an estimated cost of 5 for VF 8 For instruction:   %conv.i96 = trunc
i32 %40 to i16, !dbg !165
LV: Found an estimated cost of 0 for VF 8 For instruction:   %arrayidx.i25.i =
getelementptr inbounds i16, i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg
!166
LV: Found an estimated cost of 1 for VF 8 For instruction:   store i16
%conv.i96, i16* %arrayidx.i25.i, align 2, !dbg !167, !tbaa !61
LV: Found an estimated cost of 5 for VF 8 For instruction:   %conv5.i97 = trunc
i32 %42 to i16, !dbg !168
LV: Found an estimated cost of 0 for VF 8 For instruction:   %arrayidx.i.i100 =
getelementptr inbounds i16, i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg
!169
LV: Found an estimated cost of 1 for VF 8 For instruction:   store i16
%conv5.i97, i16* %arrayidx.i.i100, align 2, !dbg !170, !tbaa !61
LV: Found an estimated cost of 1 for VF 8 For instruction:  
%indvars.iv.next196 = add nuw nsw i64 %indvars.iv195, 1, !dbg !171
LV: Found an estimated cost of 1 for VF 8 For instruction:   %cmp18 = icmp ult
i64 %indvars.iv.next196, %34, !dbg !172
LV: Found an estimated cost of 0 for VF 8 For instruction:   br i1 %cmp18,
label %for.inc24, label %omp.inner.for.inc.loopexit207, !dbg !120, !llvm.loop
!173
LV: Vector loop of width 8 costs: 10.
LV: Found an estimated cost of 0 for VF 16 For instruction:   %indvars.iv195 =
phi i64 [ 0, %for.inc24.lr.ph ], [ %indvars.iv.next196, %for.inc24 ]
LV: Found an estimated cost of 0 for VF 16 For instruction:  
%arrayidx.i.i.i.i83 = getelementptr inbounds i16, i16* %arrayidx.i.i1.i.i.i,
i64 %indvars.iv195, !dbg !145
LV: Found an estimated cost of 1 for VF 16 For instruction:   %35 = load i16,
i16* %arrayidx.i.i.i.i83, align 2, !dbg !146, !tbaa !61
LV: Found an estimated cost of 4 for VF 16 For instruction:   %conv.i.i.i84 =
sext i16 %35 to i32, !dbg !146
LV: Found an estimated cost of 0 for VF 16 For instruction:  
%arrayidx.i.i.i.i.i.i.i85 = getelementptr inbounds i16, i16*
%process.sroa.6148.0.copyload, i64 %indvars.iv195, !dbg !147
LV: Found an estimated cost of 0 for VF 16 For instruction:  
%arrayidx.i.i11.i.i.i.i = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 16 For instruction:   %36 = load i16,
i16* %arrayidx.i.i11.i.i.i.i, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 4 for VF 16 For instruction:   %conv3.i.i.i.i86
= sext i16 %36 to i32, !dbg !150
LV: Found an estimated cost of 0 for VF 16 For instruction:  
%arrayidx.i.i11.1.i.i.i.i87 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %33, !dbg !148
LV: Found an estimated cost of 1 for VF 16 For instruction:   %37 = load i16,
i16* %arrayidx.i.i11.1.i.i.i.i87, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 4 for VF 16 For instruction:  
%conv3.1.i.i.i.i88 = sext i16 %37 to i32, !dbg !150
LV: Found an estimated cost of 8 for VF 16 For instruction:   %mul.1.i.i.i.i89
= shl nsw i32 %conv3.1.i.i.i.i88, 3, !dbg !151
LV: Found an estimated cost of 0 for VF 16 For instruction:  
%arrayidx.i.i11.2.i.i.i.i90 = getelementptr inbounds i16, i16*
%arrayidx.i.i.i.i.i.i.i85, i64 %idxprom.i.i.i.2.i.i.i.i, !dbg !148
LV: Found an estimated cost of 1 for VF 16 For instruction:   %38 = load i16,
i16* %arrayidx.i.i11.2.i.i.i.i90, align 2, !dbg !149, !tbaa !61
LV: Found an estimated cost of 4 for VF 16 For instruction:  
%conv3.2.i.i.i.i91 = sext i16 %38 to i32, !dbg !150
LV: Found an estimated cost of 8 for VF 16 For instruction:   %add4.1.i.i.i.i92
= add nsw i32 %conv3.i.i.i.i86, 4, !dbg !152
LV: Found an estimated cost of 8 for VF 16 For instruction:   %add4.2.i.i.i.i93
= add nsw i32 %add4.1.i.i.i.i92, %mul.1.i.i.i.i89, !dbg !152
LV: Found an estimated cost of 8 for VF 16 For instruction:   %add.i.i.i94 =
sub nsw i32 %add4.2.i.i.i.i93, %conv3.2.i.i.i.i91, !dbg !153
LV: Found an estimated cost of 8 for VF 16 For instruction:   %39 = lshr i32
%add.i.i.i94, 3, !dbg !154
LV: Found an estimated cost of 8 for VF 16 For instruction:   %add4.i.i.i95 =
add nsw i32 %39, %conv.i.i.i84, !dbg !155
LV: Found an estimated cost of 8 for VF 16 For instruction:   %40 = lshr i32
%add4.i.i.i95, 1, !dbg !156
LV: Found an estimated cost of 8 for VF 16 For instruction:   %add4.1.i.i.i62.i
= sub nsw i32 4, %conv3.i.i.i.i86, !dbg !157
LV: Found an estimated cost of 8 for VF 16 For instruction:   %add4.2.i.i.i63.i
= add nsw i32 %add4.1.i.i.i62.i, %mul.1.i.i.i.i89, !dbg !157
LV: Found an estimated cost of 8 for VF 16 For instruction:   %add.i.i64.i =
add nsw i32 %add4.2.i.i.i63.i, %conv3.2.i.i.i.i91, !dbg !161
LV: Found an estimated cost of 8 for VF 16 For instruction:   %41 = lshr i32
%add.i.i64.i, 3, !dbg !162
LV: Found an estimated cost of 8 for VF 16 For instruction:   %add4.i.i66.i =
sub nsw i32 %41, %conv.i.i.i84, !dbg !163
LV: Found an estimated cost of 8 for VF 16 For instruction:   %42 = lshr i32
%add4.i.i66.i, 1, !dbg !164
LV: Found an estimated cost of 6 for VF 16 For instruction:   %conv.i96 = trunc
i32 %40 to i16, !dbg !165
LV: Found an estimated cost of 0 for VF 16 For instruction:   %arrayidx.i25.i =
getelementptr inbounds i16, i16* %arrayidx.i.i23.i, i64 %indvars.iv195, !dbg
!166
LV: Found an estimated cost of 1 for VF 16 For instruction:   store i16
%conv.i96, i16* %arrayidx.i25.i, align 2, !dbg !167, !tbaa !61
LV: Found an estimated cost of 6 for VF 16 For instruction:   %conv5.i97 =
trunc i32 %42 to i16, !dbg !168
LV: Found an estimated cost of 0 for VF 16 For instruction:   %arrayidx.i.i100
= getelementptr inbounds i16, i16* %arrayidx.i.i.i99, i64 %indvars.iv195, !dbg
!169
LV: Found an estimated cost of 1 for VF 16 For instruction:   store i16
%conv5.i97, i16* %arrayidx.i.i100, align 2, !dbg !170, !tbaa !61
LV: Found an estimated cost of 1 for VF 16 For instruction:  
%indvars.iv.next196 = add nuw nsw i64 %indvars.iv195, 1, !dbg !171
LV: Found an estimated cost of 1 for VF 16 For instruction:   %cmp18 = icmp ult
i64 %indvars.iv.next196, %34, !dbg !172
LV: Found an estimated cost of 0 for VF 16 For instruction:   br i1 %cmp18,
label %for.inc24, label %omp.inner.for.inc.loopexit207, !dbg !120, !llvm.loop
!173
LV: Vector loop of width 16 costs: 8.
LV: Selecting VF: 4.
LV(REG): Calculating max register usage:
LV(REG): At #0 Interval # 0
LV(REG): At #1 Interval # 1
LV(REG): At #2 Interval # 2
LV(REG): At #3 Interval # 2
LV(REG): At #4 Interval # 2
LV(REG): At #5 Interval # 3
LV(REG): At #6 Interval # 4
LV(REG): At #7 Interval # 4
LV(REG): At #8 Interval # 4
LV(REG): At #9 Interval # 5
LV(REG): At #10 Interval # 5
LV(REG): At #11 Interval # 5
LV(REG): At #12 Interval # 5
LV(REG): At #13 Interval # 5
LV(REG): At #14 Interval # 5
LV(REG): At #15 Interval # 5
LV(REG): At #16 Interval # 6
LV(REG): At #17 Interval # 6
LV(REG): At #18 Interval # 6
LV(REG): At #19 Interval # 6
LV(REG): At #20 Interval # 6
LV(REG): At #21 Interval # 6
LV(REG): At #22 Interval # 6
LV(REG): At #23 Interval # 5
LV(REG): At #24 Interval # 4
LV(REG): At #25 Interval # 4
LV(REG): At #26 Interval # 3
LV(REG): At #27 Interval # 3
LV(REG): At #28 Interval # 3
LV(REG): At #30 Interval # 2
LV(REG): At #31 Interval # 2
LV(REG): At #33 Interval # 1
LV(REG): At #34 Interval # 1
LV(REG): VF = 4
LV(REG): Found max usage: 2 item
LV(REG): RegisterClass: Generic::ScalarRC, 3 registers
LV(REG): RegisterClass: Generic::VectorRC, 5 registers
LV(REG): Found invariant usage: 1 item
LV(REG): RegisterClass: Generic::VectorRC, 8 registers
LV: The target has 16 registers of Generic::ScalarRC register class
LV: The target has 16 registers of Generic::VectorRC register class
LV: Loop cost is 27
LV: Not Interleaving.
LV: Too many memory checks needed , NumRuntimePointerChecks=9,
PragmaVectorizeMemoryCheckThreshold=128,
VectorizerParams::RuntimeMemoryCheckThreshold=8, PragmaThresholdReached=0; 
ThresholdReached=1, Hints.allowReordering()=0.
LV: Not vectorizing: loop did not meet vectorization requirements.

So, is NumRuntimePointerChecks a magic value known to be optimal over large
swath of code?
Or is it something that people might be open to bumping it a little?

-- 
You are receiving this mail because:
You are on the CC list for the bug.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-bugs/attachments/20200125/74144d2e/attachment-0001.html>