<div dir="ltr"><div><div><div><div><div><div><div><div>A better explanation than I would have given is here: <a href="https://groups.google.com/forum/#!topic/llvm-dev/NlGopW6_QxE">https://groups.google.com/forum/#!topic/llvm-dev/NlGopW6_QxE</a><br><br></div>The optimization pass that would have eliminated that was turned off by -O0 so you see that behavior. At -O2 you get a tail call <br><div><br><div style="margin-left:40px"> TCRETURNdi64 <ga:@foo>, 0, <regmask %BH %BL %BP %BPL %BX %DI %DIL %EBP %EBX %EDI %ESI %RBP %RBX %RDI %RSI %SI %SIL %R12 %R13 %R14 %R15 %XMM6 %XMM7 %XMM8 %XMM9 %XMM10 %XMM11 %XMM12 %XMM13 %XMM14 %XMM15 %R12B %R13B %R14B %R15B %R12D %R13D %R14D %R15D %R12W %R13W %R14W %R15W>, %RSP<imp-use><br></div><br></div><div>O0 didn't optimize away the usage with no side effects, so it produces the assembly: <br><div style="margin-left:40px">
boo: # @boo<br># BB#0:
<br> call foo<br> nop<br> add rsp, 40<br> ret<br></div><br></div>O2 recognizes that the ret and rsp adjustment can move straight into foo() which then does all of the things boo would have done if it had locals that were used or parameters. The <br></div><div style="margin-left:40px">boo: # @boo<br># BB#0:<br> jmp foo # TAILCALL<br><br></div>It's also related to why you see both sp and rsp being marked as modified. They technically are, but since one is a subregister it doesn't need to be explicitly marked. Here's an O0 example of how bad O0 generated code can end up. Note that there aren't any uses of the return from foo, and it doesn't have side effects: <br></div><div>C file: <br><br>int foo(long x, long y, long z)<br>{<br> int retVal = x * y + z;<br> int* unused = &retVal;<br> return retVal;<br>}<br><br>void boo()<br>{<br> int x = 1;<br> int* y = &x;<br> int z = *y;<br> foo(x, *y, z);<br>}<br></div><div><br></div><div>-O0 version<br></div><div>=============================================================<br><div style="margin-left:40px">foo: # @foo<br>.Lcfi0:<br>.seh_proc foo<br># BB#0:<br> sub rsp, 24<br>.Lcfi1:<br> .seh_stackalloc 24<br>.Lcfi2:<br> .seh_endprologue<br> mov dword ptr [rsp + 8], r8d<br> mov dword ptr [rsp + 4], edx<br> mov dword ptr [rsp + 12], ecx<br> imul ecx, dword ptr [rsp + 4]<br> add ecx, dword ptr [rsp + 8]<br> mov dword ptr [rsp], ecx<br> mov rax, rsp<br> mov qword ptr [rsp + 16], rax<br> mov eax, dword ptr [rsp]<br> add rsp, 24<br> ret<br> .seh_handlerdata<br> .text<br>.Lcfi3:<br> .seh_endproc<br><br> .def boo;<br> .scl 2;<br> .type 32;<br> .endef<br> .globl boo<br> .p2align 4, 0x90<br>boo: # @boo<br>.Lcfi4:<br>.seh_proc boo<br># BB#0:<br> sub rsp, 56<br>.Lcfi5:<br> .seh_stackalloc 56<br>.Lcfi6:<br> .seh_endprologue<br> mov dword ptr [rsp + 36], 1<br> lea rax, [rsp + 36]<br> mov qword ptr [rsp + 40], rax<br> mov r8d, dword ptr [rsp + 36]<br> mov dword ptr [rsp + 52], r8d<br> mov rax, qword ptr [rsp + 40]<br> mov edx, dword ptr [rax]<br> mov ecx, dword ptr [rsp + 36]<br> call foo<br> nop<br> add rsp, 56<br> ret<br> .seh_handlerdata<br> .text<br>.Lcfi7:<br> .seh_endproc<br></div><br>===================================<br></div>-O2 version<br></div><div>Note. EDX is neither kill or def here because it always has the same value as x in this case, but if it didn't it would still get passed in by calling convention. <br></div><div><br>foo: # @foo<br># BB#0:<br> # kill: %R8D<def> %R8D<kill> %R8<def><br> # kill: %ECX<def> %ECX<kill> %RCX<def><br> imul ecx, edx<br> lea eax, [rcx + r8] <br> ret<br><br> .def boo;<br> .scl 2;<br> .type 32;<br> .endef<br> .globl boo<br> .p2align 4, 0x90<br>boo: # @boo<br># BB#0:<br> ret<br><br></div>The only reason the body of foo exists at all is because the optimizer can't be certain that another function won't call it and expect to get the address of the local. It does know that boo() does nothing with the value, doesn't return anything, and thus doesn't need to call the function. SO basically the short version of all that is that O0 accepts the code at face value and does things that aren't necesssary because it doesn't have the analysis it needs to remove them. It also produces things like the following sequence:<br>
mov dword ptr [rsp], ecx
<br>
mov rax, rsp<br> mov qword ptr [rsp + 16], rax<br> mov eax, dword ptr [rsp]
<br><br></div>which is due to an old C standard item that states that non ptr / ref input parameters may be used as temporaries. It calculated the local retVal into ecx, had to store it in the last slot because there was nothing telling it it wouldn't be used later, then stored the pointer to itself (esp) at [esp], which was never used. <br><br></div>Apologies if that was a simple example, but I like using it to show people how much optimizing compilers do. I've been able to use a similar description to teach a couple of people why optimizations were needed. <br><br></div>Back to your question the implicit / imp-def / dead / etc you're seeing are just artifacts of all that clang knows when you disable its capabilities. They would normally be used in passes later to produce the more optimal form. <br><br></div>It takes getting used to the syntax but it makes sense. <br><div class="gmail_extra"><br clear="all"><div><div class="gmail_signature" data-smartmail="gmail_signature">Cheers,<br></div></div>
Gordon Keiser<br></div><div class="gmail_extra">Software Development Engineer, Supposedly<br></div><div class="gmail_extra"><br></div><div class="gmail_extra"><div class="gmail_quote">On Mon, Feb 5, 2018 at 11:14 PM, Bhatu via llvm-dev <span dir="ltr"><<a href="mailto:llvm-dev@lists.llvm.org" target="_blank">llvm-dev@lists.llvm.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><div>Hi,</div><div><br></div><div>My understanding of a "dead" register is a def that is never used. However,</div><div>when I dump the MI after reg alloc on a simple program I see the following sequence:</div><div><br></div><div>ADJCALLSTACKDOWN64 0, 0, 0, <b>implicit-def dead %rsp</b>, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp</div><div>CALL64pcrel32 @foo, <regmask %bh %bl %bp %bpl %bx %ebp %ebx %rbp %rbx %r12 %r13 %r14 %r15 %r12b %r13b %r14b %r15b %r12d %r13d %r14d %r15d %r12w %r13w %r14w %r15w>, <b>implicit %rsp</b>, implicit %ssp, implicit-def %rsp, implicit-def %ssp</div><div>ADJCALLSTACKUP64 0, 0, implicit-def dead %rsp, implicit-def dead %eflags, implicit-def dead %ssp, implicit %rsp, implicit %ssp</div><div>RET 0</div><div><br></div><div><br></div><div>The ADJCALLSTACKDOWN64 has implicit-def dead %rsp. However the next instruction,</div><div>CALL64pcrel32 has an implicit use of %rsp. This would be a use of %rsp as defined </div><div>in ADJCALLSTACKDOWN64 making that non-dead.</div><div><br></div><div>So I guess my understanding of dead is incorrect. Could you please explain what dead means?</div><div><br></div><div><br></div><div>For reference:</div><div>Source file(a.c):</div><div>void foo(void);</div><div>void boo(){ foo(); }</div><div><br></div><div>Commands:</div><div>clang -S -emit-llvm -Xclang -disable-O0-optnone a.c</div><div>llc -print-after="stack-slot-<wbr>coloring" a.ll</div><span class="HOEnZb"><font color="#888888"><div><br></div>-- <br><div class="m_-3483481156185391380gmail_signature"><div dir="ltr"><div><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div><font size="1"><span style="color:rgb(84,141,212);font-family:arial,helvetica,sans-serif">Regards</span><br></font></div><div><div dir="ltr"><font face="arial, helvetica, sans-serif" color="#000000" size="1"><font color="#548dd4">Bhatu</font></font></div></div></div></div></div></div></div></div></div></div>
</font></span></div>
<br>______________________________<wbr>_________________<br>
LLVM Developers mailing list<br>
<a href="mailto:llvm-dev@lists.llvm.org">llvm-dev@lists.llvm.org</a><br>
<a href="http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev" rel="noreferrer" target="_blank">http://lists.llvm.org/cgi-bin/<wbr>mailman/listinfo/llvm-dev</a><br>
<br></blockquote></div><br></div></div>