[llvm-dev] IPRA, interprocedural register allocation, question

Fri Jul 8 10:24:11 PDT 2016

On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekvpandya at gmail.com> wrote:

>
>
> On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <c_plawre at qca.qualcomm.com
> > wrote:
>
>> Vivek,
>>
>>              I am looking into these function attributes in the clang docs
>>
>>                 Preserve_most
>>
>>                 Preserve_all
>>
>> They are not available in the 3.6.2 that I am currently using, but I hope
>> they exist in 3.8
>>
>>
>>
>> These should provide enough info to solve my problem,
>>
>> at the MC level calls to functions with these attributes
>>
>> with be code-gen’ed  through different “calling conventions”,
>>
>> and CALL instructions to them should have different register USE and DEF
>> info,
>>
>>
>>
> Yes I believe that preserve_most or preserve_all should help you even with
> out IPRA. But just to note IPRA can even help further for example on X86
> preserve_most cc will not preserve R11 (this can be verified from
> X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask
> based on the actual register usage and if procedure with preserve_most cc
> does not use R11 and none callsite inside of function body then IPRA will
> mark R11 as preserved. Also IPRA produces RegMask which is super set of
> RegMask due to calling convention.
>

I believe that __attribute__ ((registermask = ....))  can provide
more flexibility compare to preserve_all or preserve_most CC in some case.
So believe that we should try it out.

-Vivek

> This CALL instruction register USE and DEF info should already be useful
>>
>> to the intra-procedural register allocator (allowing values live across
>> these
>>
>> calls to be in what are otherwise caller-save registers),
>>
>> at least that’s how I read the MC dumps, every call instruction seems to
>> have
>>
>> every caller-save register flagged as “imp-def”, IE implicitly-defined by
>> the instruction,
>>
>> and hopefully what is considered a caller-save register at a call-site is
>> defined by the callee.
>>
>> And this should be the information that IPRA takes advantage of in its
>> bottom-up analysis.
>>
>>
>>
> Yes that is expected help from IPRA.
>
>>
>>
>> Which leads me to this question, when compiling an entire whole program
>> at one time,
>>
>> so there is no linking and no LTO, will there ever be IPRA that works
>> within LLC for this scenario,
>>
>> and is this an objective of your project, or are you focusing only on LTO
>> ?
>>
>> The current IPRA infrastructure works at compile time so it's scope of
> optimization is restricted to a compilation unit. So IPRA can only
> construct correct register usage information if the procedure's code is
> generated by same compiler instance that means we can't optimize library
> calls or procedure defined in other module. This is because we can't keep
> register usage information data across two different compiler instance.
>
> Now if we consider LTO, it eliminates above limitation by making a large
> IR module from smaller modules before generating code and thus we can have
> register usage information (at lest) for procedure which was previously
> defined in other module, because now with LTO every thing is in one module.
> So that also clarifies that IPRA does not do anything at link time.
>
> Now coming to LLC, it can use IPRA and optimize for functions defined in
> current module. So yes while compiling whole program ( a single huge .bc
> file) IPRA can be used with LLC. Also just note that if a software is
> written in separate files per module (which is very common) and still you
> want to maximize benefits of IPRA, then we can use llvm-link tool to
> combine several .bc files to produce a huge .bc file and use that with LLC
> to get maximum benefits.
>
>>
>>
> I know this is not the typical “linux” scenario (dynamic linking of not
>> only standard libraries,
>>
>> but also sometimes even application libraries, and lots of static linking
>> because of program
>>
>> size), but it is a typical “embedded” scenario, which is where I am
>> currently.
>>
>>
>>
> I don't understand this use case but we can have further improvement in
> IPRA for example if you have several libraries which has already compiled
> and codegen, but you are able to provide information of register usage for
> the functions of that libraries than we can think about an approach were we
> can store register usage information into a file (which will obviously
> increase compile time) and use that information across different compiler
> instances so that we can provide register usage information with out having
> actual code while compiling.
>
>>
>>
>> Other thoughts or comments ?
>>
>>
>>
> I am looking for some ideas that can improve current IPRA. So if you feel
> anything relevant please let me know we can discuss and implement feasible
> ideas.
>
> Thanks,
> Vivek
>
>>
>>
>> --Peter Lawrence.
>>
>>
>>
>>
>>
>> *From:* vivek pandya [mailto:vivekvpandya at gmail.com]
>> *Sent:* Wednesday, July 06, 2016 2:09 PM
>> *To:* llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at lists.llvm.org;
>> Lawrence, Peter <c_plawre at qca.qualcomm.com>
>> *Subject:* Re:[llvm-dev] IPRA, interprocedural register allocation,
>> question
>>
>>
>>
>> Hello Peter,
>>
>>
>>
>> Thanks to pointing out this interesting case.
>>
>> Vivek,
>>           I have an application where many of the leaf functions are
>> Hand-coded assembly language,  because they use special IO instructions
>> That only the assembler knows about.  These functions typically don't
>> Use any registers besides the incoming argument registers, IE they don't
>> Need to use any additional callee-save nor caller-save registers.
>>
>> If inline asm template has specified clobbered list properly than IPRA is
>> able to use that information and it propagates correct register mask (and
>> that also means that skipping clobbers list while IPRA enabled may broke
>> executable)
>>
>> For example in following code:
>>
>> int gcd( int a, int b ) {
>>
>>     int result ;
>>
>>     /* Compute Greatest Common Divisor using Euclid's Algorithm */
>>
>>     __asm__ __volatile__ ( "movl %1, %%r15d;"
>>
>>                           "movl %2, %%ecx;"
>>
>>                           "CONTD: cmpl $0, %%ecx;"
>>
>>                           "je DONE;"
>>
>>                           "xorl %%r13d, %%r13d;"
>>
>>                           "idivl %%ecx;"
>>
>>                           "movl %%ecx, %%r15d;"
>>
>>                           "movl %%r13d, %%ecx;"
>>
>>                           "jmp CONTD;"
>>
>>                           "DONE: movl %%r15d, %0;" : "=g" (result) : "g"
>> (a), "g" (b) : "ecx" ,"r13", "r15"
>>
>>     );
>>
>>
>>
>>     return result ;
>>
>> }
>>
>> IPRA calculates and propagates correct regmask in which it marks CH, CL,
>> ECX .. clobbered and R13, R15 is not marked clobbered as it is callee saved
>> and LLVM code generators also insert spill/restores code for them.
>>
>>
>>
>> Is there any way in your IPRA interprocedural register allocation project
>> that
>> The user can supply this information for external functions ?
>>
>> By external word do you here mean function defined in other module than
>> being used?  In that case as IPRA can operate on only one module at time
>> register usage propagation is not possible. But there is a work around for
>> this problem. You can use IPRA with link time optimization enabled because
>> the way LLVM LTO works it creates a big IR modules out of source files and
>> them optimize and codegen it so in that case IPRA can have actual register
>> usage info (if function will be compiled in current module).
>>
>>
>>
>> In case you want to experiment with IPRA please apply
>> http://reviews.llvm.org/D21395 this patch before you begin.
>>
>>
>>
>> -Vivek
>>
>>
>>
>> Perhaps using some form of __attribute__ ?
>> Maybe __attribute__ ((registermask = ....))  ?
>>
>>
>> --Peter Lawrence.
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160708/b7995503/attachment.html>