[llvm-dev] Comparing Clang and GCC: only clang stores updated value in each iteration.
Jonas Paulsson via llvm-dev
llvm-dev at lists.llvm.org
Fri Sep 21 00:15:48 PDT 2018
Hi Philip and Eli,
> I think your example may be a bit over reduced. Unless I'm misreading
> this, a starts at 1, is incremented one each iteration, and then is
> tested against zero. The only way this loop can exit is if a has
> wrapped around and C++ states that signed integers are assumed to not
> overflow. We can/should be replacing the whole loop with an unreachable.
>
> Do we still fail to optimize if either a) you use an unsigned which
> has defined overflow or b) you use a non-zero exit test? That is,
> change the example to something like:
> int a = 1;
> void b() {
> do
> if (a)
> a++;
> while (a != 500);
> }
Yes, both if I change 'a' to unsigned, or replace the exit test with
500, clang stores in each iteration while gcc does not.
> (Eli) Your testcase is a bit weird because the condition of the while
loop is the same as the condition of the if statement. Is that really
what the original loop looks like?
No, not really, the reduced one just shows the difference between gcc
and clang. There were some variations to this, but I chose this since it
gave a very small output. Sorry if it was confusing.
>
> If so, then yes, this is probably a case where the aggressive LoopPRE
> mentioned in the other thread that Eli linked to would be useful.
> Once we'd done the PRE, then everything else should collapse.
Thanks for the link, it's good to know this issue is recognized. If I
understand it correctly, the reason clang is storing in each iteration
is due to concurrency. As a newbie I wonder how this works in practice
since even if the value is stored in each iteration two threads could
still do this simultaneously if not some sort of atomic operation is
doing it, right? What happens here is that the value of 'a' is loaded
once before the loop, then incremented and stored in each iteration. How
does that help with multiple threads compared to storing it after the loop?
Is there an option to change this behavior in gcc or clang? It seems
that gcc is assuming a single thread, while clang is not. It would be
nice to have the same setting here when comparing them. Or am I missing
something?
Thanks
Jonas
>
>>
>> bin/clang -O3 -march=z13 -mllvm -unroll-count=1
>>
>> .text
>> .file "testfun.i"
>> .globl b # -- Begin function b
>> .p2align 4
>> .type b, at function
>> b: # @b
>> # %bb.0: # %entry
>> lrl %r0, a
>> .LBB0_1: # %do.body
>> # =>This Inner Loop Header:
>> Depth=1
>> cije %r0, 0, .LBB0_3
>> # %bb.2: # %if.then
>> # in Loop: Header=BB0_1
>> Depth=1
>> ahi %r0, 1
>> strl %r0, a
>> .LBB0_3: # %do.cond
>> # in Loop: Header=BB0_1
>> Depth=1
>> cijlh %r0, 0, .LBB0_1
>> # %bb.4: # %do.end
>> br %r14
>> .Lfunc_end0:
>> .size b, .Lfunc_end0-b
>> # -- End function
>> .type a, at object # @a
>> .data
>> .globl a
>> .p2align 2
>> a:
>> .long 1 # 0x1
>> .size a, 4
>>
>>
>> gcc -O3 -march=z13:
>>
>> .file "testfun.i"
>> .machinemode zarch
>> .machine "z13"
>> .text
>> .align 8
>> .globl b
>> .type b, @function
>> b:
>> .LFB0:
>> .cfi_startproc
>> larl %r1,a
>> lt %r1,0(%r1)
>> je .L1
>> larl %r1,a
>> mvhi 0(%r1),0
>> .L1:
>> br %r14
>> .cfi_endproc
>> .LFE0:
>> .size b, .-b
>> .globl a
>> .data
>> .align 4
>> .type a, @object
>> .size a, 4
>> a:
>> .long 1
>> .ident "GCC: (GNU) 8.0.1 20180324 (Red Hat 8.0.1-0.20)"
>> .section .note.GNU-stack,"", at progbits
>>
>> _______________________________________________
>> LLVM Developers mailing list
>> llvm-dev at lists.llvm.org
>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>
More information about the llvm-dev
mailing list