[cfe-dev] [llvm-dev] IPRA, interprocedural register allocation, question

vivek pandya via cfe-dev cfe-dev at lists.llvm.org
Mon Jul 11 11:27:32 PDT 2016


Dear Peter, Hal and Mehdi,

I did some hack around clang so that I can attach a string attribute to
function declaration.
So I think instead of adding new regmask attribute it would be better to
use existing annotate attribute for example we can use it as follows:

extern void foo() __attribute__((annotate("REGMASK:R11,R8")));  // here
R11, R8 are clobbered regs

this will add string REGMASK:R11,R8 into llvm.metadata section and it will
be tied to function foo via llvm.global.annotations. ( This currently works
with function definitions only, work is needed to make this work with
function declaration ) . The llvm.metadata should be accessed at IR level
and then it can be parsed to create a regmask out of it.

The parsing will need to access Module object, and I hope when parsing for
all such function reconnecting global annotation for function and string
value would be simple.

An other approach would be adding a new attribute regmask and while codegen
to IR this attribute should get lowered to corresponding string attribute
in LLVM IR ( which should also be added) and then a pass would iterate
through all such function which has such an attribute and populate register
usage container.

But any idea to simplify is welcomed. Please share your views.

I have cced clang mailing list so that clang developers can correct me if I
have make any mistake in context of clang.

Sincerely,
Vivek

On Sat, Jul 9, 2016 at 9:56 AM, vivek pandya <vivekvpandya at gmail.com> wrote:

>
>
> On Sat, Jul 9, 2016 at 8:15 AM, Lawrence, Peter <c_plawre at qca.qualcomm.com
> > wrote:
>
>> Vivek,
>>
>>            IIUC it seems that we need two pieces of information to do
>> IPRA,
>>
>> 1. what registers the callee clobbers
>>
>> 2. what the callee does to the call-graph
>>
> Yes I think this is enough, but in your case we don't require #2
>
>>
>>
>> And it is #2 that we are missing when we define an external function,
>>
>> Even when we declare it with a preserves or a regmask attribute,
>>
>>
>>
> Because I think  once we have effect of attribute at IR/MI level then we
> can just parse it and populate register usage information vector for
> declared function and then we can propagate reg mask on each call site
> encountered.
> But I am not user will it be easy to get new attribute working or we may
> need to hack clang for that too.
>
> I would also like to have thoughts from my mentors (Mehdi Amini and Hal
> Finkel) about this.
>
>> So what I / we need is another attribute that says this is a leaf
>> function,
>>
>> At least in my case all I’m really concerned with are leaf functions
>>
>>
>>
> I am stating with a simple function  declaration which have a custom
> attribute.
>
> -Vivek
>
>>
>>
>> Thoughts ?
>>
>>
>>
>>
>>
>> --Peter Lawrence.
>>
>>
>>
>>
>>
>>
>>
>> *From:* vivek pandya [mailto:vivekvpandya at gmail.com]
>> *Sent:* Friday, July 08, 2016 10:24 AM
>> *To:* Lawrence, Peter <c_plawre at qca.qualcomm.com>
>> *Cc:* llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at lists.llvm.org
>> *Subject:* Re: Re:[llvm-dev] IPRA, interprocedural register allocation,
>> question
>>
>>
>>
>>
>>
>>
>>
>> On Fri, Jul 8, 2016 at 1:42 PM, vivek pandya <vivekvpandya at gmail.com>
>> wrote:
>>
>>
>>
>>
>>
>> On Fri, Jul 8, 2016 at 9:47 AM, Lawrence, Peter <
>> c_plawre at qca.qualcomm.com> wrote:
>>
>> Vivek,
>>
>>              I am looking into these function attributes in the clang docs
>>
>>                 Preserve_most
>>
>>                 Preserve_all
>>
>> They are not available in the 3.6.2 that I am currently using, but I hope
>> they exist in 3.8
>>
>>
>>
>> These should provide enough info to solve my problem,
>>
>> at the MC level calls to functions with these attributes
>>
>> with be code-gen’ed  through different “calling conventions”,
>>
>> and CALL instructions to them should have different register USE and DEF
>> info,
>>
>>
>>
>> Yes I believe that preserve_most or preserve_all should help you even
>> with out IPRA. But just to note IPRA can even help further for example on
>> X86 preserve_most cc will not preserve R11 (this can be verified from
>> X86CallingConv.td and X86RegisterInfo.cpp) how ever IPAR calculates regmask
>> based on the actual register usage and if procedure with preserve_most cc
>> does not use R11 and none callsite inside of function body then IPRA will
>> mark R11 as preserved. Also IPRA produces RegMask which is super set of
>> RegMask due to calling convention.
>>
>>
>>
>> I believe that __attribute__ ((registermask = ....))  can provide
>> more flexibility compare to preserve_all or preserve_most CC in some case.
>> So believe that we should try it out.
>>
>>
>>
>> -Vivek
>>
>>
>>
>> This CALL instruction register USE and DEF info should already be useful
>>
>> to the intra-procedural register allocator (allowing values live across
>> these
>>
>> calls to be in what are otherwise caller-save registers),
>>
>> at least that’s how I read the MC dumps, every call instruction seems to
>> have
>>
>> every caller-save register flagged as “imp-def”, IE implicitly-defined by
>> the instruction,
>>
>> and hopefully what is considered a caller-save register at a call-site is
>> defined by the callee.
>>
>> And this should be the information that IPRA takes advantage of in its
>> bottom-up analysis.
>>
>>
>>
>> Yes that is expected help from IPRA.
>>
>>
>>
>> Which leads me to this question, when compiling an entire whole program
>> at one time,
>>
>> so there is no linking and no LTO, will there ever be IPRA that works
>> within LLC for this scenario,
>>
>> and is this an objective of your project, or are you focusing only on LTO
>> ?
>>
>> The current IPRA infrastructure works at compile time so it's scope of
>> optimization is restricted to a compilation unit. So IPRA can only
>> construct correct register usage information if the procedure's code is
>> generated by same compiler instance that means we can't optimize library
>> calls or procedure defined in other module. This is because we can't keep
>> register usage information data across two different compiler instance.
>>
>>
>>
>> Now if we consider LTO, it eliminates above limitation by making a large
>> IR module from smaller modules before generating code and thus we can have
>> register usage information (at lest) for procedure which was previously
>> defined in other module, because now with LTO every thing is in one module.
>> So that also clarifies that IPRA does not do anything at link time.
>>
>>
>>
>> Now coming to LLC, it can use IPRA and optimize for functions defined in
>> current module. So yes while compiling whole program ( a single huge .bc
>> file) IPRA can be used with LLC. Also just note that if a software is
>> written in separate files per module (which is very common) and still you
>> want to maximize benefits of IPRA, then we can use llvm-link tool to
>> combine several .bc files to produce a huge .bc file and use that with LLC
>> to get maximum benefits.
>>
>>
>>
>> I know this is not the typical “linux” scenario (dynamic linking of not
>> only standard libraries,
>>
>> but also sometimes even application libraries, and lots of static linking
>> because of program
>>
>> size), but it is a typical “embedded” scenario, which is where I am
>> currently.
>>
>>
>>
>> I don't understand this use case but we can have further improvement in
>> IPRA for example if you have several libraries which has already compiled
>> and codegen, but you are able to provide information of register usage for
>> the functions of that libraries than we can think about an approach were we
>> can store register usage information into a file (which will obviously
>> increase compile time) and use that information across different compiler
>> instances so that we can provide register usage information with out having
>> actual code while compiling.
>>
>>
>>
>> Other thoughts or comments ?
>>
>>
>>
>> I am looking for some ideas that can improve current IPRA. So if you feel
>> anything relevant please let me know we can discuss and implement feasible
>> ideas.
>>
>>
>>
>> Thanks,
>>
>> Vivek
>>
>>
>>
>> --Peter Lawrence.
>>
>>
>>
>>
>>
>> *From:* vivek pandya [mailto:vivekvpandya at gmail.com]
>> *Sent:* Wednesday, July 06, 2016 2:09 PM
>> *To:* llvm-dev <llvm-dev at lists.llvm.org>; llvm-dev-request at lists.llvm.org;
>> Lawrence, Peter <c_plawre at qca.qualcomm.com>
>> *Subject:* Re:[llvm-dev] IPRA, interprocedural register allocation,
>> question
>>
>>
>>
>> Hello Peter,
>>
>>
>>
>> Thanks to pointing out this interesting case.
>>
>> Vivek,
>>           I have an application where many of the leaf functions are
>> Hand-coded assembly language,  because they use special IO instructions
>> That only the assembler knows about.  These functions typically don't
>> Use any registers besides the incoming argument registers, IE they don't
>> Need to use any additional callee-save nor caller-save registers.
>>
>> If inline asm template has specified clobbered list properly than IPRA is
>> able to use that information and it propagates correct register mask (and
>> that also means that skipping clobbers list while IPRA enabled may broke
>> executable)
>>
>> For example in following code:
>>
>> int gcd( int a, int b ) {
>>
>>     int result ;
>>
>>     /* Compute Greatest Common Divisor using Euclid's Algorithm */
>>
>>     __asm__ __volatile__ ( "movl %1, %%r15d;"
>>
>>                           "movl %2, %%ecx;"
>>
>>                           "CONTD: cmpl $0, %%ecx;"
>>
>>                           "je DONE;"
>>
>>                           "xorl %%r13d, %%r13d;"
>>
>>                           "idivl %%ecx;"
>>
>>                           "movl %%ecx, %%r15d;"
>>
>>                           "movl %%r13d, %%ecx;"
>>
>>                           "jmp CONTD;"
>>
>>                           "DONE: movl %%r15d, %0;" : "=g" (result) : "g"
>> (a), "g" (b) : "ecx" ,"r13", "r15"
>>
>>     );
>>
>>
>>
>>     return result ;
>>
>> }
>>
>> IPRA calculates and propagates correct regmask in which it marks CH, CL,
>> ECX .. clobbered and R13, R15 is not marked clobbered as it is callee saved
>> and LLVM code generators also insert spill/restores code for them.
>>
>>
>>
>> Is there any way in your IPRA interprocedural register allocation project
>> that
>> The user can supply this information for external functions ?
>>
>> By external word do you here mean function defined in other module than
>> being used?  In that case as IPRA can operate on only one module at time
>> register usage propagation is not possible. But there is a work around for
>> this problem. You can use IPRA with link time optimization enabled because
>> the way LLVM LTO works it creates a big IR modules out of source files and
>> them optimize and codegen it so in that case IPRA can have actual register
>> usage info (if function will be compiled in current module).
>>
>>
>>
>> In case you want to experiment with IPRA please apply
>> http://reviews.llvm.org/D21395 this patch before you begin.
>>
>>
>>
>> -Vivek
>>
>>
>>
>> Perhaps using some form of __attribute__ ?
>> Maybe __attribute__ ((registermask = ....))  ?
>>
>>
>> --Peter Lawrence.
>>
>>
>>
>>
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/cfe-dev/attachments/20160711/797abc3a/attachment.html>


More information about the cfe-dev mailing list