[LLVMdev] Poor register allocations vs gcc
Matthias Braun
mbraun at apple.com
Mon Jul 13 11:25:00 PDT 2015
> On Jul 13, 2015, at 11:08 AM, deco33000 Jog <deco33000 at yandex.com> wrote:
>
>
> Hello,
> Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps.
>
> Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.
Register allocation is not the problem here. If you look at the gcc produced code you see "movl $0, %eax" as well (no idea why it wouldn't use xorl to zero the register).
I looked into it again and the fact that llvms version is 1 instruction more is because the addition of 71 is folded into the last leal which means the value before adding the 71 and the value plus 71 is alive in the part before the puts call effectively leading to an additional mov instruction being necessary to duplicate the value. You could file a PR if you really care about the issue.
- Matthias
> --
> Sent from Yandex.Mail for mobile
>
> 20:03, 13 July 2015, Matthias Braun <mbraun at apple.com>:
>
>
> On Jul 13, 2015, at 10:03 AM, deco33000 at yandex.com wrote:
>
> Hello,
>
> I have an issue with the llvm optimizations. I need to create object codes.
>
> the -ON PURPOSE poor && useless- code :
> ---------------------------------------------------
> #include <stdio.h>
> #include <stdlib.h>
>
> int ci(int a){
>
> return 23;
>
> }
> int flop(int a, char ** c){
>
> a += 71;
>
> int b = 0;
>
> if (a == 56){
>
> b = 69;
> b += ci(a);
> }
>
> puts("ok");
> return a + b;
> }
> --------------------------------------
>
> Compiled that way (using the versions I downloaded and eventually compiled) :
> clang_custom -std=c11 -O3 -march=native -c app2.c -S
>
> against gcc:
> gcc_custom -std=c11 -O3 -march=native -c app2.c -S
>
> Versions (latest for each, downloaded just a few days ago):
> gcc : 5.1
> clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin
>
> Host:
> osx yosemite.
>
> The assembly (cut to the essential):
>
> LLVM:
> pushq %rbp
> movq %rsp, %rbp
> pushq %r14
> pushq %rbx
> movl %edi, %r14d
> leal 71(%r14), %eax
> xorl %ecx, %ecx
> cmpl $56, %eax
> movl $92, %ebx
> cmovnel %ecx, %ebx
> leaq L_.str(%rip), %rdi
> callq _puts
> leal 71(%rbx,%r14), %eax
> popq %rbx
> popq %r14
> popq %rbp
> retq
>
> and the gcc one:
>
> pushq %rbp
> movl $0, %eax
> movl $92, %ebp
> pushq %rbx
> leal 71(%rdi), %ebx
> leaq LC1(%rip), %rdi
> subq $8, %rsp
> cmpl $56, %ebx
> cmovne %eax, %ebp
> call _puts
> addq $8, %rsp
> leal 0(%rbp,%rbx), %eax
> popq %rbx
> popq %rbp
> ret
>
> As we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result.
>
> Are there some optimizations I can bring on the table to avoid this ?
>
> As far as I know clang on OS X always sets up a frame pointer unless you explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace and others rely on frame pointers being present.
>
> I don't see why using %ecx would be a problem, there are no extra spill/reloads produced because of that.
>
> - Matthias
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150713/7865f99a/attachment.html>
More information about the llvm-dev
mailing list