[llvm] r207940 - LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling.
Benjamin Kramer
benny.kra at gmail.com
Mon May 5 03:09:08 PDT 2014
On 05.05.2014, at 01:18, Nadav Rotem <nrotem at apple.com> wrote:
> Hi Ben,
>
> Thanks for working on this. Overall it sounds like a good change and unrolling 8 times sounds way too high, even for small loops. Did you get a chance to measure the performance difference of this patch?
I didn't find any significant runtime change in the test suite or when trying some of the synthetic benchmarks that were showing extreme unrolling. Code size is a bit better though.
I initially observed this behavior when looking into the vectorizer ( http://llvm.org/bugs/show_bug.cgi?id=14985 ) For the trivial loop in the test case we used to unroll 2x in the loop vectorizer (that's a good thing) and then up to another 8x in the loop unroller, when we're targeting core2 or higher. I asked Hal and he agreed that we were unrolling too much.
I guess it makes sense to actually use the threshold derived from the processor manuals to drive unrolling instead of assuming that more unrolling is better :)
- Ben
>
> Thanks,
> Nadav
>
>
> On May 4, 2014, at 12:12 PM, Benjamin Kramer <benny.kra at googlemail.com> wrote:
>
>> Author: d0k
>> Date: Sun May 4 14:12:38 2014
>> New Revision: 207940
>>
>> URL: http://llvm.org/viewvc/llvm-project?rev=207940&view=rev
>> Log:
>> LoopUnroll: If we're doing partial unrolling, use the PartialThreshold to limit unrolling.
>>
>> Otherwise we use the same threshold as for complete unrolling, which is
>> way too high. This made us unroll any loop smaller than 150 instructions
>> by 8 times, but only if someone specified -march=core2 or better,
>> which happens to be the default on darwin.
>>
>> Modified:
>> llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp
>> llvm/trunk/test/Transforms/LoopUnroll/X86/partial.ll
>>
>> Modified: llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp?rev=207940&r1=207939&r2=207940&view=diff
>> ==============================================================================
>> --- llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp (original)
>> +++ llvm/trunk/lib/Transforms/Scalar/LoopUnrollPass.cpp Sun May 4 14:12:38 2014
>> @@ -238,9 +238,12 @@ bool LoopUnroll::runOnLoop(Loop *L, LPPa
>> return false;
>> }
>> uint64_t Size = (uint64_t)LoopSize*Count;
>> - if (TripCount != 1 && Size > Threshold) {
>> - DEBUG(dbgs() << " Too large to fully unroll with count: " << Count
>> - << " because size: " << Size << ">" << Threshold << "\n");
>> + if (TripCount != 1 &&
>> + (Size > Threshold || (Count != TripCount && Size > PartialThreshold))) {
>> + if (Size > Threshold)
>> + DEBUG(dbgs() << " Too large to fully unroll with count: " << Count
>> + << " because size: " << Size << ">" << Threshold << "\n");
>> +
>> bool AllowPartial = UserAllowPartial ? CurrentAllowPartial : UP.Partial;
>> if (!AllowPartial && !(Runtime && TripCount == 0)) {
>> DEBUG(dbgs() << " will not try to unroll partially because "
>>
>> Modified: llvm/trunk/test/Transforms/LoopUnroll/X86/partial.ll
>> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/LoopUnroll/X86/partial.ll?rev=207940&r1=207939&r2=207940&view=diff
>> ==============================================================================
>> --- llvm/trunk/test/Transforms/LoopUnroll/X86/partial.ll (original)
>> +++ llvm/trunk/test/Transforms/LoopUnroll/X86/partial.ll Sun May 4 14:12:38 2014
>> @@ -76,5 +76,52 @@ for.end:
>> ret void
>> }
>>
>> +define zeroext i16 @test1(i16* nocapture readonly %arr, i32 %n) #0 {
>> +entry:
>> + %cmp25 = icmp eq i32 %n, 0
>> + br i1 %cmp25, label %for.end, label %for.body
>> +
>> +for.body: ; preds = %entry, %for.body
>> + %indvars.iv = phi i64 [ %indvars.iv.next, %for.body ], [ 0, %entry ]
>> + %reduction.026 = phi i16 [ %add14, %for.body ], [ 0, %entry ]
>> + %arrayidx = getelementptr inbounds i16* %arr, i64 %indvars.iv
>> + %0 = load i16* %arrayidx, align 2
>> + %add = add i16 %0, %reduction.026
>> + %sext = mul i64 %indvars.iv, 12884901888
>> + %idxprom3 = ashr exact i64 %sext, 32
>> + %arrayidx4 = getelementptr inbounds i16* %arr, i64 %idxprom3
>> + %1 = load i16* %arrayidx4, align 2
>> + %add7 = add i16 %add, %1
>> + %sext28 = mul i64 %indvars.iv, 21474836480
>> + %idxprom10 = ashr exact i64 %sext28, 32
>> + %arrayidx11 = getelementptr inbounds i16* %arr, i64 %idxprom10
>> + %2 = load i16* %arrayidx11, align 2
>> + %add14 = add i16 %add7, %2
>> + %indvars.iv.next = add nuw nsw i64 %indvars.iv, 1
>> + %lftr.wideiv = trunc i64 %indvars.iv.next to i32
>> + %exitcond = icmp eq i32 %lftr.wideiv, %n
>> + br i1 %exitcond, label %for.end, label %for.body
>> +
>> +for.end: ; preds = %for.body, %entry
>> + %reduction.0.lcssa = phi i16 [ 0, %entry ], [ %add14, %for.body ]
>> + ret i16 %reduction.0.lcssa
>> +
>> +; This loop is too large to be partially unrolled (size=16)
>> +
>> +; CHECK-LABEL: @test1
>> +; CHECK: br
>> +; CHECK: br
>> +; CHECK: br
>> +; CHECK: br
>> +; CHECK-NOT: br
>> +
>> +; CHECK-NOUNRL-LABEL: @test1
>> +; CHECK-NOUNRL: br
>> +; CHECK-NOUNRL: br
>> +; CHECK-NOUNRL: br
>> +; CHECK-NOUNRL: br
>> +; CHECK-NOUNRL-NOT: br
>> +}
>> +
>> attributes #0 = { nounwind uwtable }
>>
>>
>>
>> _______________________________________________
>> llvm-commits mailing list
>> llvm-commits at cs.uiuc.edu
>> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
>
More information about the llvm-commits
mailing list