[cfe-dev] No loop optimisation?

Fri Nov 6 14:57:04 PST 2015

On Fri, Nov 6, 2015 at 2:43 PM, mats petersson via cfe-dev <
cfe-dev at lists.llvm.org> wrote:

> Actual source below. Now with:
>
> $ clang++ --version
> clang version 3.8.0 (http://llvm.org/git/clang.git
> 27b8a29273862fa1e9e296be8f9850a1459115c8) (http://llvm.org/git/llvm.git
> 91950eea5582709ea263cc48bd97fdea817b53d1)
> Target: x86_64-unknown-linux-gnu
> Thread model: posix
> InstalledDir: /usr/local/bin
>
>
> #include <cstdio>
> #include <cstring>
>
> #define LOOP_COUNT 1000000000
>
> unsigned long long rdtscl(void)
> {
>     unsigned int lo, hi;
>     __asm__ __volatile__ ("rdtsc" : "=a"(lo), "=d"(hi));
>     return ( (unsigned long long)lo)|( ((unsigned long long)hi)<<32 );
> }
>
> int main()
> {
>     unsigned long long before = rdtscl();
>     size_t ret;
>     for (int i = 0; i < LOOP_COUNT; i++)
>         ret = strlen("abcd");
>     unsigned long long after = rdtscl();
>     printf("Strlen %lld ret=%zd\n",(after - before),  ret);
>
>     before = rdtscl();
>     for (int i = 0; i < LOOP_COUNT; i++)
>         ret = sizeof("abcd");
>     after = rdtscl();
>     printf("Strlen %lld ret=%zd\n",(after - before),  ret);
> }
>
>
> llvm generated:
>
> ; Function Attrs: nounwind uwtable
> define i32 @main() #0 {
> entry:
>   %0 = tail call { i32, i32 } asm sideeffect "rdtsc",
> "={ax},={dx},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !1
>   %asmresult1.i = extractvalue { i32, i32 } %0, 1
>   %conv2.i = zext i32 %asmresult1.i to i64
>   %shl.i = shl nuw i64 %conv2.i, 32
>   %asmresult.i = extractvalue { i32, i32 } %0, 0
>   %conv.i = zext i32 %asmresult.i to i64
>   %or.i = or i64 %shl.i, %conv.i
>   %1 = tail call { i32, i32 } asm sideeffect "rdtsc",
> "={ax},={dx},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !1
>   %asmresult.i.25 = extractvalue { i32, i32 } %1, 0
>   %asmresult1.i.26 = extractvalue { i32, i32 } %1, 1
>   %conv.i.27 = zext i32 %asmresult.i.25 to i64
>   %conv2.i.28 = zext i32 %asmresult1.i.26 to i64
>   %shl.i.29 = shl nuw i64 %conv2.i.28, 32
>   %or.i.30 = or i64 %shl.i.29, %conv.i.27
>   %sub = sub i64 %or.i.30, %or.i
>   %call2 = tail call i32 (i8*, ...) @printf(i8* nonnull getelementptr
> inbounds ([21 x i8], [21 x i8]* @.str, i64 0, i64 0), i64 %sub, i64 4)
>   %2 = tail call { i32, i32 } asm sideeffect "rdtsc",
> "={ax},={dx},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !1
>   %asmresult1.i.32 = extractvalue { i32, i32 } %2, 1
>   %conv2.i.34 = zext i32 %asmresult1.i.32 to i64
>   %shl.i.35 = shl nuw i64 %conv2.i.34, 32
>   br label %for.cond.5
>
> for.cond.5:                                       ; preds = %for.cond.5,
> %entry
>   %i4.0 = phi i32 [ 0, %entry ], [ %inc10.18, %for.cond.5 ]
>   %inc10.18 = add nsw i32 %i4.0, 19
>   %exitcond.18 = icmp eq i32 %inc10.18, 1000000001
>   br i1 %exitcond.18, label %for.cond.cleanup.7, label %for.cond.5
>
> for.cond.cleanup.7:                               ; preds = %for.cond.5
>   %asmresult.i.31 = extractvalue { i32, i32 } %2, 0
>   %conv.i.33 = zext i32 %asmresult.i.31 to i64
>   %or.i.36 = or i64 %shl.i.35, %conv.i.33
>   %3 = tail call { i32, i32 } asm sideeffect "rdtsc",
> "={ax},={dx},~{dirflag},~{fpsr},~{flags}"() #2, !srcloc !1
>   %asmresult.i.37 = extractvalue { i32, i32 } %3, 0
>   %asmresult1.i.38 = extractvalue { i32, i32 } %3, 1
>   %conv.i.39 = zext i32 %asmresult.i.37 to i64
>   %conv2.i.40 = zext i32 %asmresult1.i.38 to i64
>   %shl.i.41 = shl nuw i64 %conv2.i.40, 32
>   %or.i.42 = or i64 %shl.i.41, %conv.i.39
>   %sub13 = sub i64 %or.i.42, %or.i.36
>   %call14 = tail call i32 (i8*, ...) @printf(i8* nonnull getelementptr
> inbounds ([21 x i8], [21 x i8]* @.str, i64 0, i64 0), i64 %sub13, i64 5)
>   ret i32 0
> }
>
>
> Am I missing something?
>

Definitely looks like a bug. It's a consequence of both loops using the
same 'ret' variable. More specifically, the second loop body ends up with a
phi for the value of 'ret', whose value is either 4 or 5 depending on
whether the loop body is executed 0 times or more than 0 times, and the
existence of that phi prevents us from removing the second loop. (The phi
disappears once we remove the first loop, but we don't try to remove the
second loop again afterwards.)

[The 'add 19' is a comical red herring; we unroll the loop 19 times (as 19
is a largeish factor of the trip count), then fold all 19 'add 1'
instructions into one, but this happens after simplifycfg has already tried
and failed to remove the second loop.]

> --
> Mats
>
> On 6 November 2015 at 16:10, mats petersson <mats at planetcatfish.com>
> wrote:
>
>> Just to be clear: I was using -O2, and the "first loop" which has strlen
>> in it was optimised to a constant value and no loop, the second loop, with
>> sizeof, was not "removed".
>>
>> --
>> Mats
>>
>> On 6 November 2015 at 14:30, Renato Golin via cfe-dev <
>> cfe-dev at lists.llvm.org> wrote:
>>
>>> On 6 November 2015 at 14:25, Joerg Sonnenberger via cfe-dev
>>> <cfe-dev at lists.llvm.org> wrote:
>>> > Also, if you look at IR make sure to run the appropiate passes first.
>>> > Clang output by default has very few optimisations in the -emit-llvm
>>> > output.
>>>
>>> That's intentional. Try -O2 and your IR will look much nicer. :)
>>>
>>> --renato
>>> _______________________________________________
>>> cfe-dev mailing list
>>> cfe-dev at lists.llvm.org
>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>>>
>>
>>
>
> _______________________________________________
> cfe-dev mailing list
> cfe-dev at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20151106/71c8affa/attachment.html>