[llvm-dev] A thought to improve IPRA

Fri Jul 29 03:02:44 PDT 2016

On Fri, Jul 29, 2016 at 9:01 AM, Hal Finkel <hfinkel at anl.gov> wrote:

> ----- Original Message -----
> > From: "vivek pandya" <vivekvpandya at gmail.com>
> > To: "Mehdi Amini" <mehdi.amini at apple.com>
> > Cc: "llvm-dev" <llvm-dev at lists.llvm.org>, "Hal Finkel" <hfinkel at anl.gov>,
> "Quentin Colombet" <qcolombet at apple.com>
> > Sent: Thursday, July 28, 2016 2:59:02 PM
> > Subject: Re: A thought to improve IPRA
> >
> >
> > I have been working on PGO driven IPRA and I want to measure if this
> > help to reduce execution time. So as mentioned earlier the idea is
> > to make cold function register usage free i.e saving and restoring
> > all used register by such cold function so caller of that function
> > will have more free registers. So here I am changing standard callee
> > saved registers set to a set which will be decided dynamically based
> > on the actual register usage.
> >
> > I am facing few problems to get this working:
> > 1 ) While generating CFI for such function it requires to map Dwarf
> > register to LLVM register and even if we force LLVM to use Dwarf
> > register number for CFI then also it will be wrong for some register
> > for which currently we don't have such mapping for example R8D
> > register on X86 (when dealing with actual register usage info we may
> > have such case where R8D is being used)
> > To fix this I tried to filter the functions which will be optimized
> > by putting a constraints that it should have attribute NoUnwind but
> > that does not help. Is it possible to disable CFI generation?
>
> Disabling CFI generation does not seem like the right solution. If the R8D
> definition, and similar, need DWARF register numbers, then we should fix
> that (you can try rearranging things and using DwarfRegAlias, or at least
> for testing, add the same DwarfRegNum as for R8).
> Adding DwarfRegNum as for R8 does not work because the mapping is
> currently generated as a sorted array on the first value of the key
> for DwarfLLVMRegPair. So in the build directory in file
> X86GenRegisterInfo.inc this will add 2 different entries for mapping LLVM
> Reg to Dwarf number i.e R8 -> 8 and R8D -> 8 but in the array of mapping
> Dwarf to LLVM Reg there is only entry as it will add 8 only once.

>
> >
> > 2) R8D is a 48 bit register
>
> Why do you say that? For one thing, it is in a register class GR32 and
> holds only 32-bit values.
>
Sorry this is my bad R8D is 32 bit value. To get this working changing CC
for cold functions to "preserve_all" seems to be easy and safe way. Let me
know your thought about this.

-Vivek

>
>  -Hal
>
> > but pushing and popping such register is
> > not allowed and current implementation for CalleeSaved Register also
> > uses either 64 bit or 32 bit version of X86 instruction according to
> > target. So here I think it may be good to push/pop R8 for R8D (i.e I
> > don't want to change current implementation which inserts MI for
> > CSR) for that I need to find biggest register for which given
> > register is alias like R8 has R8D as alias. How can I find that?
> > I tried to use getMatchingSuperReg(unsigned Reg, unsigned SubIdx,
> > const TargetRegisterClass *RC) but here I don't know what will be
> > SubIdx for given Reg in given RC.
> >
> >
> > So for example if a function which should be optimized for above
> > optimization is having following set of clobbered registers:
> > R8D,R8, ECX, EAX, RAX, ESI It should push/pop R8, RCX, RAX, RSI.
> >
> >
> > Please help!
> > - Vivek
> >
> >
> >
> >
> > On Sat, Jul 9, 2016 at 12:26 AM, vivek pandya <
> > vivekvpandya at gmail.com > wrote:
> >
> >
> >
> >
> >
> >
> >
> > On Sat, Jul 9, 2016 at 12:18 AM, Mehdi Amini < mehdi.amini at apple.com
> > > wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> > On Jul 8, 2016, at 11:41 AM, vivek pandya < vivekvpandya at gmail.com >
> > wrote:
> >
> >
> >
> >
> > On Fri, Jul 8, 2016 at 11:46 PM, Mehdi Amini < mehdi.amini at apple.com
> > > wrote:
> >
> >
> >
> >
> >
> >
> >
> >
> > On Jul 8, 2016, at 11:12 AM, vivek pandya < vivekvpandya at gmail.com >
> > wrote:
> >
> >
> >
> >
> >
> > Hello LLVM Developers,
> >
> >
> > I have a thought to improve IPRA and I would like summaries
> > discussion on IRC regarding that so we can develop an idea out of
> > that if it really helps.
> >
> >
> > So idea is to have more callee saved registers at infrequently called
> > leaf procedures and try provide more registers to procedures which
> > are in upper region of the call graph. But as pointed out by Quentin
> > this optimization may help in context of "true" IPRA but in our case
> > we may not require this. But I think that it can improve performance
> > in current IPRA. I explain both arguments ( Quentin's and mine) with
> > following example.
> >
> >
> > Consider following call sequence A->B->C , here C is very less time
> > called leaf procedure while A is called frequently and B may call C
> > based on some condition now while propagating actual register usage
> > information from C to A we almost clobbered most of the registers so
> > in this case as per Quentin's point we does not hurt the performance
> > as we fall back to CC but I think we can improve the performance as
> > follows:
> > If we mark every register preserved by C (i.e having more spill
> > reloads at procedure entry and exit ) and if this can help at A.
> > Suppose A requires more number of distinct registers than CC can
> > provide and if not provided it will spill variables to memory. Now
> > if we can provide more registers at A by having more spills at C
> > then we can save spill at A which can be beneficial because A is
> > frequently called but C is less frequently called and thus reducing
> > total number of spill/restore in program execution.
> >
> >
> > However again effect of this optimization will be limited by the
> > scope of current IPRA (i.e one Module only) because we can' really
> > propagate the details about more callee saved registers to caller
> > which is defined in other module, but still it may helpful.
> >
> >
> > Any thoughts on this ?
> >
> >
> >
> >
> > I think it is interesting, have you considered:
> >
> >
> > - the code size impact? (C will have a lot of spills)
> > Yes, this needs to be address with some heuristics based on call
> > frequency to C and no of clobbers it has. Also can we say that a
> > function which does not have any kind of call instruction in it's
> > body will have less clobbers ?
> >
> >
> > I am not sure what you mean.
> > A function which may do lots of computation but does not required to
> > call any other function may not have too many simultaneous live
> > ranges thus with very few registers it can be compiled.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > - what if C is cold but all (most) of its call sites are located in
> > different modules?
> > Can we user Uses to get no of call site in current module and based
> > on that we decide to optimize? Again some heuristics .
> >
> >
> > Of course, but what I’m mentioning is exactly what does not work with
> > that.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > - an alternative approach where we would break the CGSCC ordering to
> > codegen B and A before C, so we would be able to spill minimally
> > when performing the codegen for C?
> > Do you here mean marking all preserve for C while code gen for B and
> > then when we come to C (top-down) we may avoid some spills if C can
> > use regs which are not really used by B?
> >
> >
> > Yes, but it may be harder to implement for not much gain after all.
> >
> >
> >
> >
> >
> > Also this can be applied to a function which is less frequently
> > called and which may not be a leaf function. It may help.
> >
> >
> > Sure, you can just refer to this as “PGO driven IPRA”.
> > Ok I will look into this.
> >
> >
> > Vivek
> >
> >
> >
> >
> >
> > —
> > Mehdi
> >
> >
> >
> >
>
> --
> Hal Finkel
> Assistant Computational Scientist
> Leadership Computing Facility
> Argonne National Laboratory
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160729/9ac9b5e5/attachment.html>