[llvm-dev] Help required regarding IPRA and Local Function optimization
vivek pandya via llvm-dev
llvm-dev at lists.llvm.org
Sat Jul 2 04:27:54 PDT 2016
On Sat, Jul 2, 2016 at 5:07 AM, Quentin Colombet <qcolombet at apple.com>
wrote:
> Hi Vivek,
>
> I believe your reduced test case is broken.
>
> > On Jun 30, 2016, at 1:51 AM, vivek pandya <vivekvpandya at gmail.com>
> wrote:
> >
> > Hello Mentors,
> >
> > I am currently finding bug in Local Function related optimization due to
> which runtime failures are observed in some test cases, as those test cases
> are containing very large function with recursion and object oriented code
> so I am not able to find a pattern which is causing failure. So I tried
> following simple case to understand expected behavior from this
> optimization.
> >
> > Consider following code :
> >
> > define void @bar() #0 {
> > call void asm sideeffect "movl %ecx, %r15d", "~{r15}"() #0
> > call void @foo()
> > call void asm sideeffect "movl %r15d, %ebx", "~{rbx}"() #0
> > ret void
> > }
> >
> > define internal void @foo() #0 {
> > call void asm sideeffect "movl %r14d, %r15d", "~{r15}"() #0
> > ret void
> > }
> >
> > and its generated assembly code when IPRA enabled:
> >
> > .section __TEXT,__text,regular,pure_instructions
> > .macosx_version_min 10, 12
> > .p2align 4, 0x90
> > _foo: ## @foo
> > .cfi_startproc
> > ## BB#0:
> > ## InlineAsm Start
> > movl %r14d, %r15d
> > ## InlineAsm End
> > retq
> > .cfi_endproc
> >
> > .globl _bar
> > .p2align 4, 0x90
> > _bar: ## @bar
> > .cfi_startproc
> > ## BB#0:
> > pushq %r15
> > Ltmp0:
> > .cfi_def_cfa_offset 16
> > pushq %rbx
> > Ltmp1:
> > .cfi_def_cfa_offset 24
> > pushq %rax
> > Ltmp2:
> > .cfi_def_cfa_offset 32
> > Ltmp3:
> > .cfi_offset %rbx, -24
> > Ltmp4:
> > .cfi_offset %r15, -16
> > ## InlineAsm Start
> > movl %ecx, %r15d
> > ## InlineAsm End
> > callq _foo
> > ## InlineAsm Start
> > movl %r15d, %ebx
> > ## InlineAsm End
> > addq $8, %rsp
> > popq %rbx
> > popq %r15
> > retq
> > .cfi_endproc
> >
> >
> > .subsections_via_symbols
> >
> > now foo clobbers R15 (which is callee saved) but as foo is local
> function IPRA will mark R15 as clobbered and foo will not have save/restore
> for R15 in prologue/epilog . Now for above function code to work correctly
> in call site of foo in bar save and restore of R15 is expected but I am not
> able to find a pass in llvm which does that in fact if I am not wrong
> RegMasks of call site will be used by reg allocators by
> LiveIntervals::checkRegMaskInterference and due to that if R15 is marked
> clobbered by call _foo then R15 will not be used for live-range which is
> spanned across call _foo. ( that it self is other concerns because it may
> result in virtual reg spill due to lack of available regs, as while setting
> callee saved regs none it will be propagated through regmaks)
> >
> > Here are my questions related to this example:
> > 1) Is there any pass or code in LLVM which is responsible for caller
> saved register for Physical Registers? By looking at InlineSpiller.cpp it
> is responsible for VReg spilling.
>
> If you caller saved register "by hand” (like with inline assembly, you are
> supposed to control their live range.
> What I am saying is that if you want support from the compiler, you need
> to give it this freedom, and your test case does not provide that.
> i.e., if you want the compiler to help, you would need to save r15 in a
> virtual register, and use this virtual register in the next inline asm
> statement.
> E.g. (do not try to run that code, the syntax is probably wrong, but I
> wanted to illustrate the idea)
>
> define void @bar() #0 {
> call void asm sideeffect "movl %ecx, %r15d; movl %r15d, $r", i32
> %tmpVal, "~{r15}"() #0
> call void @foo()
> call void asm side effect “movl $r, %r15d; movl %r15d, %ebx",
> "~{rbx}"() #0
> ret void
> }
>
> > 2) If such pass exists then why R15 is not saved around call __foo?
>
> R15 is not live in your example. I mean, inline asm statements are opaque
> for the compiler and it cannot track the liveness from the strings :). The
> only thing it knows, is what you tell it: you clobber r15 in one
> instruction and rbx in another. It does know the second one use r15 from
> the first one.
>
> > 3) Why _bar is saving %rax in above code?
>
> That’s an optimization :). We actually need to do sub $8 (probably to
> realign the stack), but since sub and push are as expensive, we do push.
>
> Thanks Quentin, I got your point. I will update the test case accordingly.
Sincerely,,
Vivek
> Cheers,
> -Quentin
> >
> > Please help!
> >
> > Sincerely,
> > Vivek
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160702/4f2bcad8/attachment.html>
More information about the llvm-dev
mailing list