[llvm] r200213 - [vectorizer] Teach the loop vectorizer's unroller to only unroll by
Hal Finkel
hfinkel at anl.gov
Mon Jan 27 06:36:02 PST 2014
----- Original Message -----
> From: "Chandler Carruth" <chandlerc at gmail.com>
> To: llvm-commits at cs.uiuc.edu
> Sent: Monday, January 27, 2014 5:12:24 AM
> Subject: [llvm] r200213 - [vectorizer] Teach the loop vectorizer's unroller to only unroll by
>
> Author: chandlerc
> Date: Mon Jan 27 05:12:24 2014
> New Revision: 200213
>
> URL: http://llvm.org/viewvc/llvm-project?rev=200213&view=rev
> Log:
> [vectorizer] Teach the loop vectorizer's unroller to only unroll by
> powers of two. This is essentially always the correct thing given the
> impact on alignment, scaling factors that can be used in addressing
> modes, etc.
Chandler, please add a TTI callback to control this. On the PPC A2, it really is a good thing, sometimes, to unroll by 3 or 5. PPC does not have scaled addressing modes, and the important thing there is instruction latency hiding.
-Hal
> Also, fix the management of the unroll vs. small loop
> cost
> to more accurately model things with this world.
>
> Enhance a test case to actually exercise more of the unroll machinery
> if
> using synthetic constants rather than a specific target model. Before
> this change, with the added flags this test will unroll 3 times
> instead
> of either 2 or 4 (the two sensible answers).
>
> While I don't expect this to make a huge difference, if there are
> lots
> of loops sitting right on the edge of hitting the 'small unroll'
> factor,
> they might change behavior. However, I've benchmarked moving the
> small
> loop cost up and down in many various ways and by a huge factor (2x)
> without seeing more than 0.2% code size growth. Small adjustments
> such
> as the series that led up here have led to about 1% improvement on
> some
> benchmarks, but it is very close to the noise floor so I mostly
> checked
> that nothing regressed. Let me know if you see bad behavior on other
> targets but I don't expect this to be a sufficiently dramatic change
> to
> trigger anything.
>
> Modified:
> llvm/trunk/include/llvm/Support/MathExtras.h
> llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
> llvm/trunk/test/Transforms/LoopVectorize/unroll_novec.ll
>
> Modified: llvm/trunk/include/llvm/Support/MathExtras.h
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/Support/MathExtras.h?rev=200213&r1=200212&r2=200213&view=diff
> ==============================================================================
> --- llvm/trunk/include/llvm/Support/MathExtras.h (original)
> +++ llvm/trunk/include/llvm/Support/MathExtras.h Mon Jan 27 05:12:24
> 2014
> @@ -552,6 +552,13 @@ inline uint64_t NextPowerOf2(uint64_t A)
> return A + 1;
> }
>
> +/// Returns the power of two which is less than or equal to the
> given value.
> +/// Essentially, it is a floor operation across the domain of powers
> of two.
> +inline uint64_t PowerOf2Floor(uint64_t A) {
> + if (!A) return 0;
> + return 1ull << (63 - countLeadingZeros(A, ZB_Undefined));
> +}
> +
> /// Returns the next integer (mod 2**64) that is greater than or
> equal to
> /// \p Value and is a multiple of \p Align. \p Align must be
> non-zero.
> ///
>
> Modified: llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp?rev=200213&r1=200212&r2=200213&view=diff
> ==============================================================================
> --- llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp (original)
> +++ llvm/trunk/lib/Transforms/Vectorize/LoopVectorize.cpp Mon Jan 27
> 05:12:24 2014
> @@ -5004,8 +5004,11 @@ LoopVectorizationCostModel::selectUnroll
> // registers. These registers are used by all of the unrolled
> instances.
> // Next, divide the remaining registers by the number of registers
> that is
> // required by the loop, in order to estimate how many parallel
> instances
> - // fit without causing spills.
> - unsigned UF = (TargetNumRegisters - R.LoopInvariantRegs) /
> R.MaxLocalUsers;
> + // fit without causing spills. All of this is rounded down if
> necessary to be
> + // a power of two. We want power of two unroll factors to simplify
> any
> + // addressing operations or alignment considerations.
> + unsigned UF = PowerOf2Floor((TargetNumRegisters -
> R.LoopInvariantRegs) /
> + R.MaxLocalUsers);
>
> // Clamp the unroll factor ranges to reasonable factors.
> unsigned MaxUnrollSize = TTI.getMaximumUnrollFactor();
> @@ -5045,7 +5048,7 @@ LoopVectorizationCostModel::selectUnroll
> DEBUG(dbgs() << "LV: Loop cost is " << LoopCost << '\n');
> if (LoopCost < SmallLoopCost) {
> DEBUG(dbgs() << "LV: Unrolling to reduce branch cost.\n");
> - unsigned NewUF = SmallLoopCost / (LoopCost + 1);
> + unsigned NewUF = PowerOf2Floor(SmallLoopCost / LoopCost);
> return std::min(NewUF, UF);
> }
>
>
> Modified: llvm/trunk/test/Transforms/LoopVectorize/unroll_novec.ll
> URL:
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopVectorize/unroll_novec.ll?rev=200213&r1=200212&r2=200213&view=diff
> ==============================================================================
> --- llvm/trunk/test/Transforms/LoopVectorize/unroll_novec.ll
> (original)
> +++ llvm/trunk/test/Transforms/LoopVectorize/unroll_novec.ll Mon Jan
> 27 05:12:24 2014
> @@ -1,4 +1,4 @@
> -; RUN: opt < %s -loop-vectorize -force-vector-width=1
> -force-vector-unroll=2 -dce -instcombine -S | FileCheck %s
> +; RUN: opt < %s -loop-vectorize -force-vector-width=1
> -force-target-num-scalar-regs=16 -force-target-max-scalar-unroll=8
> -small-loop-cost=20 -dce -instcombine -S | FileCheck %s
>
> target datalayout =
> "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64-S128"
> target triple = "x86_64-apple-macosx10.8.0"
> @@ -12,10 +12,20 @@ target triple = "x86_64-apple-macosx10.8
> ;CHECK-LABEL: @inc(
> ;CHECK: load i32*
> ;CHECK: load i32*
> +;CHECK: load i32*
> +;CHECK: load i32*
> +;CHECK-NOT: load i32*
> +;CHECK: add nsw i32
> ;CHECK: add nsw i32
> ;CHECK: add nsw i32
> +;CHECK: add nsw i32
> +;CHECK-NOT: add nsw i32
> +;CHECK: store i32
> +;CHECK: store i32
> ;CHECK: store i32
> ;CHECK: store i32
> +;CHECK-NOT: store i32
> +;CHECK: add i64 %{{.*}}, 4
> ;CHECK: ret void
> define void @inc(i32 %n) nounwind uwtable noinline ssp {
> %1 = icmp sgt i32 %n, 0
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
More information about the llvm-commits
mailing list