<html><head><meta http-equiv="Content-Type" content="text/html charset=us-ascii"></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class=""><div class=""><br class=""></div><div><blockquote type="cite" class=""><div class="">On Jul 13, 2015, at 11:08 AM, deco33000 Jog <<a href="mailto:deco33000@yandex.com" class="">deco33000@yandex.com</a>> wrote:</div><br class="Apple-interchange-newline"><div class=""><br class="">Hello, <br class="">Ecx is a problem because you have to xor it. Which is avoided in the gcc compilation. Fomit-pointer-frame helps.<br class=""><br class="">Now llvm is one instruction from gcc. If ecx was not used, it would be as fast.<br class=""></div></blockquote><div>Register allocation is not the problem here. If you look at the gcc produced code you see "movl $0, %eax" as well (no idea why it wouldn't use xorl to zero the register).</div><div>I looked into it again and the fact that llvms version is 1 instruction more is because the addition of 71 is folded into the last leal which means the value before adding the 71 and the value plus 71 is alive in the part before the puts call effectively leading to an additional mov instruction being necessary to duplicate the value. You could file a PR if you really care about the issue.</div><div><br class=""></div><div>- Matthias</div><br class=""><blockquote type="cite" class=""><div class="">-- <br class="">Sent from Yandex.Mail for mobile<br class=""><br class="">20:03, 13 July 2015, Matthias Braun <<a href="mailto:mbraun@apple.com" class="">mbraun@apple.com</a>>:<br class=""><blockquote class=""><br class=""><br class=""><blockquote class=""> On Jul 13, 2015, at 10:03 AM, <a href="mailto:deco33000@yandex.com" class="">deco33000@yandex.com</a> wrote:<br class=""><br class=""> Hello,<br class=""><br class=""> I have an issue with the llvm optimizations. I need to create object codes.<br class=""><br class=""> the -ON PURPOSE poor && useless- code :<br class=""> ---------------------------------------------------<br class=""> #include <stdio.h><br class=""> #include <stdlib.h><br class=""><br class=""> int ci(int a){<br class=""><br class=""> return 23;<br class=""><br class=""> }<br class=""> int flop(int a, char ** c){<br class=""><br class=""> a += 71;<br class=""><br class=""> int b = 0;<br class=""><br class=""> if (a == 56){<br class=""><br class=""> b = 69;<br class=""> b += ci(a);<br class=""> }<br class=""><br class=""> puts("ok");<br class=""> return a + b;<br class=""> }<br class=""> --------------------------------------<br class=""><br class=""> Compiled that way (using the versions I downloaded and eventually compiled) :<br class=""> clang_custom -std=c11 -O3 -march=native -c app2.c -S<br class=""><br class=""> against gcc:<br class=""> gcc_custom -std=c11 -O3 -march=native -c app2.c -S<br class=""><br class=""> Versions (latest for each, downloaded just a few days ago):<br class=""> gcc : 5.1<br class=""> clang/llvm: clang+llvm-3.6.1-x86_64-apple-darwin<br class=""><br class=""> Host:<br class=""> osx yosemite.<br class=""><br class=""> The assembly (cut to the essential):<br class=""><br class=""> LLVM:<br class=""> pushq %rbp<br class=""> movq %rsp, %rbp<br class=""> pushq %r14<br class=""> pushq %rbx<br class=""> movl %edi, %r14d<br class=""> leal 71(%r14), %eax<br class=""> xorl %ecx, %ecx<br class=""> cmpl $56, %eax<br class=""> movl $92, %ebx<br class=""> cmovnel %ecx, %ebx<br class=""> leaq L_.str(%rip), %rdi<br class=""> callq _puts<br class=""> leal 71(%rbx,%r14), %eax<br class=""> popq %rbx<br class=""> popq %r14<br class=""> popq %rbp<br class=""> retq<br class=""><br class=""> and the gcc one:<br class=""><br class=""> pushq %rbp<br class=""> movl $0, %eax<br class=""> movl $92, %ebp<br class=""> pushq %rbx<br class=""> leal 71(%rdi), %ebx<br class=""> leaq LC1(%rip), %rdi<br class=""> subq $8, %rsp<br class=""> cmpl $56, %ebx<br class=""> cmovne %eax, %ebp<br class=""> call _puts<br class=""> addq $8, %rsp<br class=""> leal 0(%rbp,%rbx), %eax<br class=""> popq %rbx<br class=""> popq %rbp<br class=""> ret<br class=""><br class=""> As we can see, llvm makes poor register allocations (ecx and r14), leading to more instructions for the same result.<br class=""><br class=""> Are there some optimizations I can bring on the table to avoid this ?<br class=""></blockquote><br class="">As far as I know clang on OS X always sets up a frame pointer unless you explicitely use -fomit-frame-pointer. I think the reasoning being that dtrace and others rely on frame pointers being present.<br class=""><br class="">I don't see why using %ecx would be a problem, there are no extra spill/reloads produced because of that.<br class=""><br class="">- Matthias<br class=""><br class=""></blockquote>
</div></blockquote></div><br class=""></body></html>