[llvm] r306336 - Enable vectorizer-maximize-bandwidth by default.

Dehao Chen via llvm-commits llvm-commits at lists.llvm.org
Tue Jun 27 15:06:33 PDT 2017


Thanks a lot for the heads-up. The patch was committed in r306473.

Dehao

On Tue, Jun 27, 2017 at 10:20 AM, Zaks, Ayal <ayal.zaks at intel.com> wrote:

> Agreed, r306331 caused such
>
>   “TBAA is only for loads, stores and calls!
>
>     %reverse = shufflevector <16 x i8> %wide.load84, …”
>
> failures; it was reverted in r306338 and fixed in r306381. Sorry it caught
> r306336.
>
>
>
> *From:* Matthew Simpson [mailto:mssimpso at codeaurora.org]
> *Sent:* Tuesday, June 27, 2017 19:57
> *To:* Dehao Chen <dehao at google.com>
> *Cc:* llvm-commits <llvm-commits at lists.llvm.org>; Zaks, Ayal <
> ayal.zaks at intel.com>
> *Subject:* Re: [llvm] r306336 - Enable vectorizer-maximize-bandwidth by
> default.
>
>
>
> Dehao,
>
>
>
> I think that PPC failure may have been due to r306331, and not your patch.
> Ayal fixed a TBAA metadata issue when recommitting his patch in r306381.
>
>
>
> -- Matt
>
>
>
> On Mon, Jun 26, 2017 at 7:01 PM, Dehao Chen via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
> Reverting this again:
>
> http://lab.llvm.org:8011/builders/clang-ppc64be-linux-
> multistage/builds/4285/steps/build%20stage%202/logs/stdio
>
>
>
> Looks like it exposes some bug in ppc, reverting the patch for now, need
> to have a reproducible to investigate the bug.
>
>
>
> Dehao
>
>
>
> On Mon, Jun 26, 2017 at 2:41 PM, Dehao Chen via llvm-commits <
> llvm-commits at lists.llvm.org> wrote:
>
> Author: dehao
> Date: Mon Jun 26 14:41:09 2017
> New Revision: 306336
>
> URL: http://llvm.org/viewvc/llvm-project?rev=306336&view=rev
> Log:
> Enable vectorizer-maximize-bandwidth by default.
>
> Summary:
> vectorizer-maximize-bandwidth is generally useful in terms of performance.
> I've tested the impact of changing this to default on speccpu benchmarks on
> sandybridge machines. The result shows non-negative impact:
>
> spec/2006/fp/C++/444.namd                 26.84  -0.31%
> spec/2006/fp/C++/447.dealII               46.19  +0.89%
> spec/2006/fp/C++/450.soplex               42.92  -0.44%
> spec/2006/fp/C++/453.povray               38.57  -2.25%
> spec/2006/fp/C/433.milc                   24.54  -0.76%
> spec/2006/fp/C/470.lbm                    41.08  +0.26%
> spec/2006/fp/C/482.sphinx3                47.58  -0.99%
> spec/2006/int/C++/471.omnetpp             22.06  +1.87%
> spec/2006/int/C++/473.astar               22.65  -0.12%
> spec/2006/int/C++/483.xalancbmk           33.69  +4.97%
> spec/2006/int/C/400.perlbench             33.43  +1.70%
> spec/2006/int/C/401.bzip2                 23.02  -0.19%
> spec/2006/int/C/403.gcc                   32.57  -0.43%
> spec/2006/int/C/429.mcf                   40.35  +0.27%
> spec/2006/int/C/445.gobmk                 26.96  +0.06%
> spec/2006/int/C/456.hmmer                  24.4  +0.19%
> spec/2006/int/C/458.sjeng                 27.91  -0.08%
> spec/2006/int/C/462.libquantum            57.47  -0.20%
> spec/2006/int/C/464.h264ref               46.52  +1.35%
>
> geometric mean                                   +0.29%
>
> The regression on 453.povray seems real, but is due to secondary effects
> as all hot functions are bit-identical with and without the flag.
>
> I started this patch to consult upstream opinions on this. It will be
> greatly appreciated if the community can help test the performance impact
> of this change on other architectures so that we can decided if this should
> be target-dependent.
>
> Reviewers: hfinkel, mkuper, davidxl, chandlerc
>
> Reviewed By: chandlerc
>
> Subscribers: rengolin, sanjoy, javed.absar, bjope, dorit, magabari,
> RKSimon, llvm-commits, mzolotukhin
>
> Differential Revision: https://reviews.llvm.org/D33341
>
> Modified:
>     llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
>     llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-
> vectorization-factors.ll
>     llvm/trunk/test/Transforms/LoopVectorize/AArch64/
> reduction-small-size.ll
>     llvm/trunk/test/Transforms/LoopVectorize/ARM/gcc-examples.ll
>     llvm/trunk/test/Transforms/LoopVectorize/X86/fp64_to_
> uint32-cost-model.ll
>     llvm/trunk/test/Transforms/LoopVectorize/X86/gcc-examples.ll
>     llvm/trunk/test/Transforms/LoopVectorize/X86/masked_load_store.ll
>     llvm/trunk/test/Transforms/LoopVectorize/X86/no_fpmath.ll
>     llvm/trunk/test/Transforms/LoopVectorize/X86/no_fpmath_with_hotness.ll
>     llvm/trunk/test/Transforms/LoopVectorize/X86/reduction-crash.ll
>     llvm/trunk/test/Transforms/LoopVectorize/X86/
> vectorization-remarks-loopid-dbg.ll
>     llvm/trunk/test/Transforms/LoopVectorize/X86/vectorization-remarks.ll
>
> Modified: llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/
> Transforms/Vectorize/LoopVectorize.cpp?rev=306336&
> r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp (original)
> +++ llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp Mon Jun 26
> 14:41:09 2017
> @@ -122,7 +122,7 @@ static cl::opt<unsigned> TinyTripCountVe
>               "value."));
>
>  static cl::opt<bool> MaximizeBandwidth(
> -    "vectorizer-maximize-bandwidth", cl::init(false), cl::Hidden,
> +    "vectorizer-maximize-bandwidth", cl::init(true), cl::Hidden,
>      cl::desc("Maximize bandwidth when selecting vectorization factor
> which "
>               "will be determined by the smallest type in loop."));
>
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-
> vectorization-factors.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/AArch64/loop-vectorization-
> factors.ll?rev=306336&r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/AArch64/loop-vectorization-factors.ll
> Mon Jun 26 14:41:09 2017
> @@ -88,9 +88,9 @@ for.body:
>  }
>
>  ; CHECK-LABEL: @add_c(
> -; CHECK: load <8 x i8>, <8 x i8>*
> -; CHECK: add <8 x i16>
> -; CHECK: store <8 x i16>
> +; CHECK: load <16 x i8>, <16 x i8>*
> +; CHECK: add <16 x i16>
> +; CHECK: store <16 x i16>
>  ; Function Attrs: nounwind
>  define void @add_c(i8* noalias nocapture readonly %p, i16* noalias
> nocapture %q, i32 %len) #0 {
>  entry:
> @@ -116,9 +116,9 @@ for.body:
>  }
>
>  ; CHECK-LABEL: @add_d(
> -; CHECK: load <4 x i16>
> -; CHECK: add nsw <4 x i32>
> -; CHECK: store <4 x i32>
> +; CHECK: load <8 x i16>
> +; CHECK: add nsw <8 x i32>
> +; CHECK: store <8 x i32>
>  define void @add_d(i16* noalias nocapture readonly %p, i32* noalias
> nocapture %q, i32 %len) #0 {
>  entry:
>    %cmp7 = icmp sgt i32 %len, 0
> @@ -187,16 +187,16 @@ for.body:
>  }
>
>  ; CHECK-LABEL: @add_f
> -; CHECK: load <8 x i16>
> -; CHECK: trunc <8 x i16>
> -; CHECK: shl <8 x i8>
> -; CHECK: add <8 x i8>
> -; CHECK: or <8 x i8>
> -; CHECK: mul <8 x i8>
> -; CHECK: and <8 x i8>
> -; CHECK: xor <8 x i8>
> -; CHECK: mul <8 x i8>
> -; CHECK: store <8 x i8>
> +; CHECK: load <16 x i16>
> +; CHECK: trunc <16 x i16>
> +; CHECK: shl <16 x i8>
> +; CHECK: add <16 x i8>
> +; CHECK: or <16 x i8>
> +; CHECK: mul <16 x i8>
> +; CHECK: and <16 x i8>
> +; CHECK: xor <16 x i8>
> +; CHECK: mul <16 x i8>
> +; CHECK: store <16 x i8>
>  define void @add_f(i16* noalias nocapture readonly %p, i8* noalias
> nocapture %q, i8 %arg1, i8 %arg2, i32 %len) #0 {
>  entry:
>    %cmp.32 = icmp sgt i32 %len, 0
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/AArch64/
> reduction-small-size.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/AArch64/reduction-small-size.
> ll?rev=306336&r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/AArch64/reduction-small-size.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/AArch64/reduction-small-size.ll
> Mon Jun 26 14:41:09 2017
> @@ -123,16 +123,16 @@ for.body:
>  ; }
>  ;
>  ; CHECK: vector.body:
> -; CHECK:   phi <8 x i16>
> -; CHECK:   [[Ld1:%[a-zA-Z0-9.]+]] = load <8 x i8>
> -; CHECK:   zext <8 x i8> [[Ld1]] to <8 x i16>
> -; CHECK:   [[Ld2:%[a-zA-Z0-9.]+]] = load <8 x i8>
> -; CHECK:   zext <8 x i8> [[Ld2]] to <8 x i16>
> -; CHECK:   add <8 x i16>
> -; CHECK:   add <8 x i16>
> +; CHECK:   phi <16 x i16>
> +; CHECK:   [[Ld1:%[a-zA-Z0-9.]+]] = load <16 x i8>
> +; CHECK:   zext <16 x i8> [[Ld1]] to <16 x i16>
> +; CHECK:   [[Ld2:%[a-zA-Z0-9.]+]] = load <16 x i8>
> +; CHECK:   zext <16 x i8> [[Ld2]] to <16 x i16>
> +; CHECK:   add <16 x i16>
> +; CHECK:   add <16 x i16>
>  ;
>  ; CHECK: middle.block:
> -; CHECK:   [[Rdx:%[a-zA-Z0-9.]+]] = call i16 @llvm.experimental.vector.reduce.add.i16.v8i16(<8
> x i16>
> +; CHECK:   [[Rdx:%[a-zA-Z0-9.]+]] = call i16 @llvm.experimental.vector.reduce.add.i16.v16i16(<16
> x i16>
>  ; CHECK:   zext i16 [[Rdx]] to i32
>  ;
>  define i16 @reduction_i16_2(i8* nocapture readonly %a, i8* nocapture
> readonly %b, i32 %n) {
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/ARM/gcc-examples.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/ARM/gcc-examples.ll?rev=306336&r1=
> 306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/ARM/gcc-examples.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/ARM/gcc-examples.ll Mon Jun
> 26 14:41:09 2017
> @@ -35,9 +35,9 @@ define void @example1() nounwind uwtable
>  }
>
>  ;CHECK-LABEL: @example10b(
> -;CHECK: load <4 x i16>
> -;CHECK: sext <4 x i16>
> -;CHECK: store <4 x i32>
> +;CHECK: load <8 x i16>
> +;CHECK: sext <8 x i16>
> +;CHECK: store <8 x i32>
>  ;CHECK: ret void
>  define void @example10b(i16* noalias nocapture %sa, i16* noalias
> nocapture %sb, i16* noalias nocapture %sc, i32* noalias nocapture %ia, i32*
> noalias nocapture %ib, i32* noalias nocapture %ic) nounwind uwtable ssp {
>    br label %1
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/fp64_to_
> uint32-cost-model.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll?
> rev=306336&r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/fp64_to_uint32-cost-model.ll
> Mon Jun 26 14:41:09 2017
> @@ -9,7 +9,9 @@ target triple = "x86_64-apple-macosx"
>
>  ; If we need to scalarize the fptoui and then use inserts to build up the
>  ; vector again, then there is certainly no value in going 256-bit wide.
> -; CHECK-NOT: vpinsrd
> +; But as we default to maximize bandwidth, we should convert it to 256-bit
> +; anyway.
> +; CHECK: vpinsrd
>
>  define void @convert() {
>  entry:
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/gcc-examples.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/X86/gcc-examples.ll?rev=306336&r1=
> 306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/gcc-examples.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/gcc-examples.ll Mon Jun
> 26 14:41:09 2017
> @@ -44,17 +44,16 @@ define void @example1() nounwind uwtable
>    ret void
>  }
>
> -; Select VF=4 because sext <8 x i1> to <8 x i32> is expensive.
>  ;CHECK-LABEL: @example10b(
> -;CHECK: load <4 x i16>
> -;CHECK: sext <4 x i16>
> -;CHECK: store <4 x i32>
> +;CHECK: load <8 x i16>
> +;CHECK: sext <8 x i16>
> +;CHECK: store <8 x i32>
>  ;CHECK: ret void
>  ;UNROLL-LABEL: @example10b(
> -;UNROLL: load <4 x i16>
> -;UNROLL: load <4 x i16>
> -;UNROLL: store <4 x i32>
> -;UNROLL: store <4 x i32>
> +;UNROLL: load <8 x i16>
> +;UNROLL: load <8 x i16>
> +;UNROLL: store <8 x i32>
> +;UNROLL: store <8 x i32>
>  ;UNROLL: ret void
>  define void @example10b(i16* noalias nocapture %sa, i16* noalias
> nocapture %sb, i16* noalias nocapture %sc, i32* noalias nocapture %ia, i32*
> noalias nocapture %ib, i32* noalias nocapture %ic) nounwind uwtable ssp {
>    br label %1
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/masked_load_
> store.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/X86/masked_load_store.ll?rev=
> 306336&r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/masked_load_store.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/masked_load_store.ll Mon
> Jun 26 14:41:09 2017
> @@ -260,20 +260,28 @@ for.end:
>  ;  }
>  ;}
>
> -;AVX-LABEL: @foo3
> -;AVX: icmp slt <4 x i32> %wide.load, <i32 100, i32 100,
> -;AVX: call <4 x double> @llvm.masked.load.v4f64.p0v4f64
> -;AVX: sitofp <4 x i32> %wide.load to <4 x double>
> -;AVX: fadd <4 x double>
> -;AVX: call void @llvm.masked.store.v4f64.p0v4f64
> -;AVX: ret void
> +;AVX1-LABEL: @foo3
> +;AVX1: icmp slt <4 x i32> %wide.load, <i32 100, i32 100,
> +;AVX1: call <4 x double> @llvm.masked.load.v4f64.p0v4f64
> +;AVX1: sitofp <4 x i32> %wide.load to <4 x double>
> +;AVX1: fadd <4 x double>
> +;AVX1: call void @llvm.masked.store.v4f64.p0v4f64
> +;AVX1: ret void
> +
> +;AVX2-LABEL: @foo3
> +;AVX2: icmp slt <8 x i32> %wide.load, <i32 100, i32 100,
> +;AVX2: call <8 x double> @llvm.masked.load.v8f64.p0v8f64
> +;AVX2: sitofp <8 x i32> %wide.load to <8 x double>
> +;AVX2: fadd <8 x double>
> +;AVX2: call void @llvm.masked.store.v8f64.p0v8f64
> +;AVX2: ret void
>
>  ;AVX512-LABEL: @foo3
> -;AVX512: icmp slt <8 x i32> %wide.load, <i32 100, i32 100,
> -;AVX512: call <8 x double> @llvm.masked.load.v8f64.p0v8f64
> -;AVX512: sitofp <8 x i32> %wide.load to <8 x double>
> -;AVX512: fadd <8 x double>
> -;AVX512: call void @llvm.masked.store.v8f64.p0v8f64
> +;AVX512: icmp slt <16 x i32> %wide.load, <i32 100, i32 100,
> +;AVX512: call <16 x double> @llvm.masked.load.v16f64.p0v16f64
> +;AVX512: sitofp <16 x i32> %wide.load to <16 x double>
> +;AVX512: fadd <16 x double>
> +;AVX512: call void @llvm.masked.store.v16f64.p0v16f64
>  ;AVX512: ret void
>
>
> @@ -502,19 +510,19 @@ for.end:
>  ;  }
>  ;}
>  ;AVX2-LABEL: @foo6
> -;AVX2: icmp sgt <4 x i32> %reverse, zeroinitializer
> -;AVX2: shufflevector <4 x i1>{{.*}}<4 x i32> <i32 3, i32 2, i32 1, i32 0>
> -;AVX2: call <4 x double> @llvm.masked.load.v4f64.p0v4f64
> -;AVX2: fadd <4 x double>
> -;AVX2: call void @llvm.masked.store.v4f64.p0v4f64
> +;AVX2: icmp sgt <8 x i32> %reverse, zeroinitializer
> +;AVX2: shufflevector <8 x i1>{{.*}}<8 x i32> <i32 7, i32 6, i32 5, i32 4
> +;AVX2: call <8 x double> @llvm.masked.load.v8f64.p0v8f64
> +;AVX2: fadd <8 x double>
> +;AVX2: call void @llvm.masked.store.v8f64.p0v8f64
>  ;AVX2: ret void
>
>  ;AVX512-LABEL: @foo6
> -;AVX512: icmp sgt <8 x i32> %reverse, zeroinitializer
> -;AVX512: shufflevector <8 x i1>{{.*}}<8 x i32> <i32 7, i32 6, i32 5, i32 4
> -;AVX512: call <8 x double> @llvm.masked.load.v8f64.p0v8f64
> -;AVX512: fadd <8 x double>
> -;AVX512: call void @llvm.masked.store.v8f64.p0v8f64
> +;AVX512: icmp sgt <16 x i32> %reverse, zeroinitializer
> +;AVX512: shufflevector <16 x i1>{{.*}}<16 x i32> <i32 15, i32 14, i32 13,
> i32 12
> +;AVX512: call <16 x double> @llvm.masked.load.v16f64.p0v16f64
> +;AVX512: fadd <16 x double>
> +;AVX512: call void @llvm.masked.store.v16f64.p0v16f64
>  ;AVX512: ret void
>
>
> @@ -582,8 +590,8 @@ for.end:
>  ; }
>
>  ;AVX512-LABEL: @foo7
> -;AVX512: call <8 x double*> @llvm.masked.load.v8p0f64.p0v8p0f64(<8 x
> double*>*
> -;AVX512: call void @llvm.masked.store.v8f64.p0v8f64
> +;AVX512: call <64 x double*> @llvm.masked.load.v64p0f64.p0v64p0f64(<64 x
> double*>*
> +;AVX512: call void @llvm.masked.store.v64f64.p0v64f64
>  ;AVX512: ret void
>
>  define void @foo7(double* noalias %out, double** noalias %in, i8* noalias
> %trigger, i32 %size) #0 {
> @@ -654,8 +662,8 @@ for.end:
>  ;}
>
>  ;AVX512-LABEL: @foo8
> -;AVX512: call <8 x i32 ()*> @llvm.masked.load.v8p0f_i32f.p0v8p0f_i32f(<8
> x i32 ()*>* %
> -;AVX512: call void @llvm.masked.store.v8f64.p0v8f64
> +;AVX512: call <64 x i32 ()*> @llvm.masked.load.v64p0f_i32f.p0v64p0f_i32f(<64
> x i32 ()*>* %
> +;AVX512: call void @llvm.masked.store.v64f64.p0v64f64
>  ;AVX512: ret void
>
>  define void @foo8(double* noalias %out, i32 ()** noalias %in, i8* noalias
> %trigger, i32 %size) #0 {
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/no_fpmath.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/X86/no_fpmath.ll?rev=306336&r1=
> 306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/no_fpmath.ll (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/no_fpmath.ll Mon Jun 26
> 14:41:09 2017
> @@ -2,7 +2,7 @@
>
>  ; CHECK: remark: no_fpmath.c:6:11: loop not vectorized: cannot prove it
> is safe to reorder floating-point operations
>  ; CHECK: remark: no_fpmath.c:6:14: loop not vectorized
> -; CHECK: remark: no_fpmath.c:17:14: vectorized loop (vectorization width:
> 2, interleaved count: 2)
> +; CHECK: remark: no_fpmath.c:17:14: vectorized loop (vectorization width:
> 4, interleaved count: 2)
>
>  target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>  target triple = "x86_64-apple-macosx10.10.0"
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/no_fpmath_
> with_hotness.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/X86/no_fpmath_with_hotness.ll?rev=
> 306336&r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/no_fpmath_with_hotness.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/no_fpmath_with_hotness.ll
> Mon Jun 26 14:41:09 2017
> @@ -3,7 +3,7 @@
>
>  ; CHECK: remark: no_fpmath.c:6:11: loop not vectorized: cannot prove it
> is safe to reorder floating-point operations (hotness: 300)
>  ; CHECK: remark: no_fpmath.c:6:14: loop not vectorized
> -; CHECK: remark: no_fpmath.c:17:14: vectorized loop (vectorization width:
> 2, interleaved count: 2) (hotness: 300)
> +; CHECK: remark: no_fpmath.c:17:14: vectorized loop (vectorization width:
> 4, interleaved count: 2) (hotness: 300)
>
>  target datalayout = "e-m:o-i64:64-f80:128-n8:16:32:64-S128"
>  target triple = "x86_64-apple-macosx10.10.0"
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/reduction-crash.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/X86/reduction-crash.ll?rev=306336&
> r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/reduction-crash.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/reduction-crash.ll Mon
> Jun 26 14:41:09 2017
> @@ -7,7 +7,7 @@ target triple = "i386-apple-darwin"
>  define void @test1(float* nocapture %arg, i32 %arg1) nounwind {
>  ; CHECK-LABEL: @test1(
>  ; CHECK: preheader
> -; CHECK: insertelement <2 x double> zeroinitializer, double %tmp, i32 0
> +; CHECK: insertelement <4 x double> zeroinitializer, double %tmp, i32 0
>  ; CHECK: vector.memcheck
>
>  bb:
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/
> vectorization-remarks-loopid-dbg.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/X86/vectorization-remarks-loopid-
> dbg.ll?rev=306336&r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/
> vectorization-remarks-loopid-dbg.ll (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/
> vectorization-remarks-loopid-dbg.ll Mon Jun 26 14:41:09 2017
> @@ -6,7 +6,7 @@
>  ; DEBUG-OUTPUT-NOT: .loc
>  ; DEBUG-OUTPUT-NOT: {{.*}}.debug_info
>
> -; VECTORIZED: remark: vectorization-remarks.c:17:8: vectorized loop
> (vectorization width: 4, interleaved count: 1)
> +; VECTORIZED: remark: vectorization-remarks.c:17:8: vectorized loop
> (vectorization width: 16, interleaved count: 1)
>  ; UNROLLED: remark: vectorization-remarks.c:17:8: interleaved loop
> (interleaved count: 4)
>  ; NONE: remark: vectorization-remarks.c:17:8: loop not vectorized:
> vectorization and interleaving are explicitly disabled, or vectorize width
> and interleave count are both set to 1
>
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/X86/
> vectorization-remarks.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/
> Transforms/LoopVectorize/X86/vectorization-remarks.ll?rev=
> 306336&r1=306335&r2=306336&view=diff
> ============================================================
> ==================
> --- llvm/trunk/test/Transforms/LoopVectorize/X86/vectorization-remarks.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/X86/vectorization-remarks.ll
> Mon Jun 26 14:41:09 2017
> @@ -6,7 +6,7 @@
>  ; DEBUG-OUTPUT-NOT: .loc
>  ; DEBUG-OUTPUT-NOT: {{.*}}.debug_info
>
> -; VECTORIZED: remark: vectorization-remarks.c:17:8: vectorized loop
> (vectorization width: 4, interleaved count: 1)
> +; VECTORIZED: remark: vectorization-remarks.c:17:8: vectorized loop
> (vectorization width: 16, interleaved count: 1)
>  ; UNROLLED: remark: vectorization-remarks.c:17:8: interleaved loop
> (interleaved count: 4)
>  ; NONE: remark: vectorization-remarks.c:17:8: loop not vectorized:
> vectorization and interleaving are explicitly disabled, or vectorize width
> and interleave count are both set to 1
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
>
>
> ---------------------------------------------------------------------
> Intel Israel (74) Limited
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20170627/58a8935a/attachment.html>


More information about the llvm-commits mailing list