<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Jul 2, 2016 at 5:07 AM, Quentin Colombet <span dir="ltr"><<a href="mailto:qcolombet@apple.com" target="_blank">qcolombet@apple.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Vivek,<br>

<br>

I believe your reduced test case is broken.<br>

<div><div class="h5"><br>

> On Jun 30, 2016, at 1:51 AM, vivek pandya <<a href="mailto:vivekvpandya@gmail.com">vivekvpandya@gmail.com</a>> wrote:<br>

><br>

> Hello Mentors,<br>

><br>

> I am currently finding bug in Local Function related optimization due to which runtime failures are observed in some test cases, as those test cases are containing very large function with recursion and object oriented code so I am not able to find a pattern which is causing failure. So I tried following simple case to understand expected behavior from this optimization.<br>

><br>

> Consider following code :<br>

><br>

> define void @bar() #0 {<br>

>   call void asm sideeffect "movl      %ecx, %r15d", "~{r15}"() #0<br>

>   call void @foo()<br>

>   call void asm sideeffect "movl      %r15d, %ebx", "~{rbx}"() #0<br>

>   ret void<br>

> }<br>

><br>

> define internal void @foo() #0 {<br>

>   call void asm sideeffect "movl      %r14d, %r15d", "~{r15}"() #0<br>

>   ret void<br>

> }<br>

><br>

> and its generated assembly code when IPRA enabled:<br>

><br>

>       .section        __TEXT,__text,regular,pure_instructions<br>

>       .macosx_version_min 10, 12<br>

>       .p2align        4, 0x90<br>

> _foo:                                   ## @foo<br>

>       .cfi_startproc<br>

> ## BB#0:<br>

>       ## InlineAsm Start<br>

>       movl    %r14d, %r15d<br>

>       ## InlineAsm End<br>

>       retq<br>

>       .cfi_endproc<br>

><br>

>       .globl  _bar<br>

>       .p2align        4, 0x90<br>

> _bar:                                   ## @bar<br>

>       .cfi_startproc<br>

> ## BB#0:<br>

>       pushq   %r15<br>

> Ltmp0:<br>

>       .cfi_def_cfa_offset 16<br>

>       pushq   %rbx<br>

> Ltmp1:<br>

>       .cfi_def_cfa_offset 24<br>

>       pushq   %rax<br>

> Ltmp2:<br>

>       .cfi_def_cfa_offset 32<br>

> Ltmp3:<br>

>       .cfi_offset %rbx, -24<br>

> Ltmp4:<br>

>       .cfi_offset %r15, -16<br>

>       ## InlineAsm Start<br>

>       movl    %ecx, %r15d<br>

>       ## InlineAsm End<br>

>       callq   _foo<br>

>       ## InlineAsm Start<br>

>       movl    %r15d, %ebx<br>

>       ## InlineAsm End<br>

>       addq    $8, %rsp<br>

>       popq    %rbx<br>

>       popq    %r15<br>

>       retq<br>

>       .cfi_endproc<br>

><br>

><br>

> .subsections_via_symbols<br>

><br>

> now foo clobbers R15 (which is callee saved) but as foo is local function IPRA will mark R15 as clobbered and foo will not have save/restore for R15 in prologue/epilog . Now for above function code to work correctly in call site of foo in bar save and restore of R15 is expected but I am not able to find a pass in llvm which does that in fact if I am not wrong RegMasks of call site will be used by reg allocators by LiveIntervals::checkRegMaskInterference and due to that if R15 is marked clobbered  by call _foo then R15 will not be used for live-range which is spanned across call _foo. ( that it self is other concerns because it may result in virtual reg spill due to lack of available regs, as while setting callee saved regs none it will be propagated through regmaks)<br>

><br>

> Here are my questions related to this example:<br>

> 1) Is there any pass or code in LLVM which is responsible for caller saved register for Physical Registers? By looking at InlineSpiller.cpp it is responsible for VReg spilling.<br>

<br>

</div></div>If you caller saved register "by hand” (like with inline assembly, you are supposed to control their live range.<br>

What I am saying is that if you want support from the compiler, you need to give it this freedom, and your test case does not provide that.<br>

i.e., if you want the compiler to help, you would need to save r15 in a virtual register, and use this virtual register in the next inline asm statement.<br>

E.g. (do not try to run that code, the syntax is probably wrong, but I wanted to illustrate the idea)<br>

<br>

define void @bar() #0 {<br>

  call void asm sideeffect "movl        %ecx, %r15d; movl %r15d, $r", i32 %tmpVal, "~{r15}"() #0<br>

  call void @foo()<br>

  call void asm side effect “movl $r, %r15d; movl       %r15d, %ebx", "~{rbx}"() #0<br>

  ret void<br>

}<br>

<span class=""><br>

> 2) If such pass exists then why R15 is not saved around call __foo?<br>

<br>

</span>R15 is not live in your example. I mean, inline asm statements are opaque for the compiler and it cannot track the liveness from the strings :). The only thing it knows, is what you tell it: you clobber r15 in one instruction and rbx in another. It does know the second one use r15 from the first one.<br>

<span class=""><br>

> 3) Why _bar is saving %rax in above code?<br>

<br>

</span>That’s an optimization :). We actually need to do sub $8 (probably to realign the stack), but since sub and push are as expensive, we do push.<br>

<br></blockquote><div>Thanks Quentin, I got your point. I will update the test case accordingly.</div><div><br></div><div>Sincerely,,</div><div>Vivek</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

Cheers,<br>

-Quentin<br>

><br>

> Please help!<br>

><br>

> Sincerely,<br>

> Vivek<br>

><br>

<br>

</blockquote></div><br></div></div>