[llvm-dev] [GSoC 2016] [Weekly Status] Interprocedural Register Allocation

Mon Jun 20 11:59:00 PDT 2016

On Mon, Jun 20, 2016 at 12:54 AM, vivek pandya <vivekvpandya at gmail.com>
wrote:

> Dear Professor,
>
> Thanks to bring this to notice, I tried out a simple test case with
> indirect function call:
>
> int foo() {
> return 12;
> }
>
> int bar(int a) {
> return foo() + a;
> }
>
> int (*fp)() = 0;
> int (*fp1)(int) = 0;
>
> int main() {
> fp = foo;
> fp();
> fp1 = bar;
> fp1(15);
> return 0;
> }
>
>
I have experimented with indirect call specially which are due to use of
function pointers as shown in above example:

Following code in RegUsageInfoPropagate.cpp handles this kind of indirect
calls :

for (MachineBasicBlock &MBB : MF) {
    for (MachineInstr &MI : MBB) {
      if (!MI.isCall())
        continue;
      DEBUG(dbgs()
            << "Call Instruction Before Register Usage Info Propagation :
\n");
      DEBUG(dbgs() << MI << "\n");

      auto UpdateRegMask = [&](const Function *F) {
        const auto *RegMask = PRUI->getRegUsageInfo(F);
        if (!RegMask)
          return;
        setRegMask(MI, &(*RegMask)[0]);
        Changed = true;
      };

      MachineOperand &Operand = MI.getOperand(0);
      if (Operand.isGlobal())
        UpdateRegMask(cast<Function>(Operand.getGlobal()));
      else if (Operand.isSymbol())
        UpdateRegMask(M->getFunction(Operand.getSymbolName()));
      else if(Operand.isReg()){
// changes starts here
        unsigned VReg = Operand.getReg();
        MachineBasicBlock::iterator CallInstIterator(&MI);
        MachineBasicBlock *MBB = MI.getParent();
        while(CallInstIterator != MBB->begin() &&
!CallInstIterator->definesRegister(VReg))
          --CallInstIterator;
        DEBUG(dbgs() << "Candidate for indirect call \n");
        if (CallInstIterator != MBB->begin()) {
          for (MachineOperand &MO : (*CallInstIterator).operands()) {
            if (MO.isGlobal()){
              UpdateRegMask(cast<Function>(MO.getGlobal()));
              break;
            }
            else if (Operand.isSymbol()) {
              UpdateRegMask(M->getFunction(MO.getSymbolName()));
              break;
            }
          }
          DEBUG(dbgs() << *CallInstIterator);
        }
      }

      DEBUG(dbgs()
            << "Call Instruction After Register Usage Info Propagation :
\n");
      DEBUG(dbgs() << MI << "\n");
    }
  }

So I would like to have mentors' review/suggestions on this
For virtual function kind of case we have to think differently,  Is this a
valid approach to deal with indirect calls ?
Please let me know your thoughts.

-Vivek

> and currently IPRA skips optimization for indirect call. But I think this
> can be handled and I will inform you if I update implementation to cover
> this. Currently IPRA uses Function * to hold register usage information
> across the passes, so my hunch is that if  from the call instruction for
> the indirect call, Function * can be derived then it should be able to
> handle indirection function calls for procedures defined in a current
> module.
>
> Sincerely,
> Vivek
>
> On Sun, Jun 19, 2016 at 4:59 PM, vivek pandya <vivekvpandya at gmail.com>
> wrote:
>
>> Dear Community,
>>
>> Please find summary of work done during this week as follow:
>>
>> Implementation:
>> ============
>>
>> During this week we have identified a bug in IPRA due to not considering
>> RegMask of function calls in given machine function. The same bug on
>> AArch64 has been reported by Chad Rosier and more detailed description can
>> be found at https://llvm.org/bugs/show_bug.cgi?id=28144 . To fix this
>> bug RegMask calculation have been modified to consider RegMask of function
>> call in a Machine Function. The patch is here
>> http://reviews.llvm.org/D21395.
>>
>> AsmPrinter.cpp is modified to print call preserved registers in comments
>> at call site in generated assembly file. This suggestion was by Quentin
>> Colombet to improve readability of asm files while experimenting RegMask
>> and calling convention etc. This simple patch can be found here
>> http://reviews.llvm.org/D21490.
>>
>> We have also experimented a simple improvement to IPRA by setting callee
>> saved registers to none for local function and we have found some
>> performance improvement.
>>
>> Testing:
>> ======
>>
>> After bug 28144 fix there is no runtime failures in test suite. Also due
>> to bug 28144 there was about 60 run time failures and total time taken for
>> test suite compilation was 30% more compare to with out IPRA. After bug fix
>> with IPRA total compile time improvement compare to without IPRA is about 4
>> to 8 minutes.
>>
>>
>> Study:
>>
>> =====
>>
>> This week I study code responsible for adding spill and restore for
>> callee saved registers. Also studied how calling convention is defined in
>> target specific .td files. I studied AsmPrinter.cpp and specifically
>> emitComments() method which is responsible for adding comments in llvm
>> generated assembly files. I also studied about some linkage type in LLMV IR
>> like ‘internal’ which represent local function in module.
>>
>>
>> Plan for next week:
>>
>> 1) Submit patch related to local function optimization for review
>>
>> 2) Find more possible improvements
>>
>> 3) Get active patches committed
>>
>> 4) Compile large software with IPRA enabled
>>
>>
>> Sincerely,
>>
>> Vivek
>>
>> On Wed, Jun 15, 2016 at 8:45 AM, vivek pandya <vivekvpandya at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Jun 15, 2016 at 8:40 AM, vivek pandya <vivekvpandya at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 15, 2016 at 6:16 AM, Quentin Colombet <qcolombet at apple.com>
>>>> wrote:
>>>>
>>>>> Hi Vivek,
>>>>>
>>>>> How much of the slow down on runtime comes from the different layout
>>>>> of the function in the asm file? (I.e., because of the dummy scc pass.)
>>>>>
>>>>> Hello Quentin,
>>>>
>>>> Please do not consider previous results as there was a major bug in
>>>> RegMask calculation due to not considering RegMasks of callee in MF body
>>>> while calculating register usage information, that has been fixed now ( as
>>>> discussed with Matthias Braun and Mehdi Amini ) and after this bugfix I
>>>> have run test-suite with and without IPRA.  Yes there is runtime slow down
>>>> for some test cases ranging from 1% to 64% similarly compile time slow down
>>>> is ranging from 1% to 48%. The runtime performance improvement is ranging
>>>> from 1% to 35% and surprisingly there is also compile time improvement in a
>>>> range from 1% to 60% . I would request you to go through complete results
>>>> at
>>>> https://docs.google.com/document/d/1cavn-POrZdhw-rrdPXV8mSvyppvOWs2rxmLgaOnd6KE/edit?usp=sharing
>>>>
>>>> In above result baseline is IPRA and current is without IPRA. So
>>> actually data with background red is actual improvement and green is
>>> regression.
>>> -Vivek
>>>
>>>> Also there is not extra failure due to IPRA now so in the result above
>>>> I have removed failures.
>>>>
>>>> Sincerely,
>>>> Vivek
>>>>
>>>>
>>>>> Cheers,
>>>>> Q
>>>>>
>>>>> Le 11 juin 2016 à 21:49, vivek pandya via llvm-dev <
>>>>> llvm-dev at lists.llvm.org> a écrit :
>>>>>
>>>>> Dear Community,
>>>>>
>>>>> The patch for Interprocedural Register Allocation has been committed
>>>>> now , thanks to Mehdi Amini for that. We would like you to play with it and
>>>>> let us know your views and more importantly ideas to improve it.
>>>>>
>>>>> The test-suite run has indicated some non trivial issue that results
>>>>> in run time failure of the programs, we will be investigating it more. Here
>>>>> are some stats :
>>>>>
>>>>> test-suite has been tested with IPRA enabled and overall results are
>>>>> not much encouraging. On average 30% increase in compile time. Many
>>>>> programs have also increase in execution time ( average 20%) that is really
>>>>> serious concern for us. About 60 tests have failed on run time this
>>>>> indicates error in compilation. how ever 3 tests have improvement in their
>>>>> runtime and that is 7% average.
>>>>>
>>>>>
>>>>> This week I think good thing for me to learn is to setup llvm
>>>>> development environment properly other wise one can end up wasting too much
>>>>> time building the llvm it self.
>>>>>
>>>>> So here is brief summary:
>>>>> Implementation:
>>>>> ============
>>>>>
>>>>> The patch has been split into analysis and transformation passes. The
>>>>> pass responsible for register usage propagation has been made target
>>>>> independent.  A print method and command line option -print-regusage has
>>>>> been added so that RegMaks details can be printed in Release builds also,
>>>>> this enables lit test case to be testable in Release build too. Other minor
>>>>> changes to adhere coding and naming conventions.
>>>>>
>>>>>
>>>>> Testing:
>>>>>
>>>>> ======
>>>>>
>>>>> test-suite has been tested with IPRA enabled.
>>>>>
>>>>>
>>>>> Study and other:
>>>>>
>>>>> =============
>>>>>
>>>>> Learned about LNT, test-suite for LLVM, Inline assembly in LLVM IR,
>>>>> fastcc, local functions, MCStream class. In C++ I leaned about emplace
>>>>> family of methods in STL and perfect forwarding introduced in C++11.
>>>>>
>>>>>
>>>>> Plan for next week:
>>>>>
>>>>> 1) Investigate issue related to functional correctness that leads to
>>>>> run time failures
>>>>>
>>>>> 2) profile the compilation process to verify increase in time due to
>>>>> IPRA
>>>>>
>>>>> 3) Improve IPRA by instructing codegen to not save register for local
>>>>> function.
>>>>>
>>>>> 4) Make the pass emit asm comments to indicate register clobbered by
>>>>> function call at call site in generated ASM file.
>>>>>
>>>>>
>>>>> Sincerely,
>>>>>
>>>>> Vivek
>>>>>
>>>>> On Sun, Jun 5, 2016 at 8:48 AM, vivek pandya <vivekvpandya at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Dear Community,
>>>>>>
>>>>>> This week I got my patch reviewed by mentors and based on that I have
>>>>>> done changes. Also we have identified a problem with callee-saved registers
>>>>>> being marked as clobbered registers so we fixed that bug. I have described
>>>>>> other minor changes in following section.
>>>>>>
>>>>>> It was expected to get the patch committed by end of this week but
>>>>>> due to unexpected mistake I was not able to complete writing test cases.
>>>>>> Sorry for that.
>>>>>> I had build llvm with ipra enable by default and that build files
>>>>>> were on my path ! Due to that next time I tried to build llvm it was
>>>>>> terribly slow  (almost 1 hour for 10% build ). I spend to much time on
>>>>>> fixing this by playing around with environment variables, cmake options etc.
>>>>>> But I think this is a serious concern, we need to think verify this
>>>>>> time complexity other wise building a large software with IPRA enable would
>>>>>> be very time consuming.
>>>>>>
>>>>>> The toughest part for this week was to get lit and FileCheck work as
>>>>>> you expect them to work, specially when analysis pass prints info on stdio
>>>>>> and there is also a output file generated by llc or opt command.
>>>>>>
>>>>>> So here is brief summary :
>>>>>>
>>>>>> Implementation:
>>>>>> ============
>>>>>>
>>>>>> RegUsageInfoCollector is now Calling Convention aware so that RegMask
>>>>>> does not mark callee saved register as clobbered register. Due to this
>>>>>> register allocator can use callee saved register for caller.
>>>>>> PhysicalRegisterUsageInfo.cpp renamed to RegisterUsageInfo.cpp.
>>>>>> StringMap used in RegisterUsageInfo.cpp is replaced by DenseMap of
>>>>>> GlobalVariable * to RegMask.
>>>>>> DummyCGSCCPass moved from TargetPassConfig.cpp to CallGraphSCCPass.h.
>>>>>> Minor correction in comments, changes to adhere coding standards of
>>>>>> LLVM.
>>>>>>
>>>>>> Testing:
>>>>>> =====
>>>>>>
>>>>>> The above mentioned changes has been tested with SNU-Realtime
>>>>>> benchmarks.
>>>>>>
>>>>>> Studied lit and FileCheck tool and written simple test to verify
>>>>>> functionality of coding.
>>>>>>
>>>>>>
>>>>>> Study and other:
>>>>>>
>>>>>> ============
>>>>>>
>>>>>> Studied some examples of lit compatible llvm IR with comments to RUN
>>>>>> test cases, FileCheck tool syntax and how to use it with in lit
>>>>>> infrastructure.
>>>>>>
>>>>>> I also understand X86 calling convention in more details.
>>>>>>
>>>>>> I also studied basic concepts in llvm IR language while reading .ll
>>>>>> files written for lit.
>>>>>>
>>>>>> I learned about rvalue references and move semantics introduced in
>>>>>> C++11.
>>>>>>
>>>>>>
>>>>>> Plan for next week:
>>>>>>
>>>>>> 1) Get the patch committed along with proper tets cases.
>>>>>>
>>>>>> 2) Analyse time complexity of the approach.
>>>>>>
>>>>>> 3) Make target specific pass to CodeGen as it seems it is not
>>>>>> required to be target specific.
>>>>>>
>>>>>> 4) If possible build a large application with ipra enable and analyze
>>>>>> the impact.
>>>>>>
>>>>>>
>>>>>> Sincerely,
>>>>>>
>>>>>> Vivek
>>>>>>
>>>>>>
>>>>>> On Sat, May 28, 2016 at 7:31 PM, vivek pandya <vivekvpandya at gmail.com
>>>>>> > wrote:
>>>>>>
>>>>>>> Dear community,
>>>>>>>
>>>>>>> This is to brief you the progress of Interprocedural Register
>>>>>>> Allocation, for those who are interested to see the progress in terms of
>>>>>>> code please consider http://reviews.llvm.org/D20769
>>>>>>> This patch contains simple infrastructure to propagate register
>>>>>>> usage information of callee to caller in call graph. The code generation
>>>>>>> order is changed to follow bottom up order on call graph , Thanks to Mehdi
>>>>>>> Amini for the patch for the same !  I will write a blog on this very soon.
>>>>>>>
>>>>>>> So during this week as per the schedule proposed it should be study
>>>>>>> related infrastructure in LLVM and finalizing an approach for IPRA, but
>>>>>>> instead I am able to implement a working (may not be fully correct)
>>>>>>> prototype because I have used community bonding period to discuss and learn
>>>>>>> related stuffs from the mentors and also due to patch for CodeGen
>>>>>>> reordering was provided by dear mentor Mehdi Amini.
>>>>>>>
>>>>>>> So I conclude the work done during this week as follows:
>>>>>>> Implementation :
>>>>>>> ============
>>>>>>> Following passes have been implemented during this week: An
>>>>>>> immutable pass to store competed RegMask, a  machine function pass that
>>>>>>> iterates through each registers and check if it is used or not and based on
>>>>>>> that details create a RegMask and a target specific machine function pass
>>>>>>> that uses the RegMask created by second pass and propagates information by
>>>>>>> updating call instructions RegMask. To update the RegMask of MI ,
>>>>>>> setRegMask() function has been added to MachineOperand, a command line
>>>>>>> option -enable-ipra and debug type -debug-only=“ipra" has been added to
>>>>>>> control the optimization through llc.
>>>>>>>
>>>>>>> Testing:
>>>>>>> =====
>>>>>>> The above mentioned implementation has been tested over
>>>>>>> SNU-Real-Time benchmark suit (
>>>>>>> http://www.cprover.org/goto-cc/examples/snu.html) and some simple
>>>>>>> programs that uses library function ( for a library function register
>>>>>>> allocation is not done by LLVM so this optimization will simply skip them)
>>>>>>>
>>>>>>> Study and Other:
>>>>>>> =============
>>>>>>> I have learned following things in LLVM, how it stores reg
>>>>>>> clobbering information? how it is used by Reg allocators through
>>>>>>> LivePhysRegs, LiveRegMatrix and other related passes? How to schedule a
>>>>>>> pass using TargetPassConfig and TargetMachine? What are called callee saved
>>>>>>> registers? What is an Immutable Pass? Apart from that I have also learned
>>>>>>> how to use phabricator to send review request. I have also read some
>>>>>>> related literatures.
>>>>>>>
>>>>>>> During this week though task was to schedule the passes in proper
>>>>>>> order so that dependencies of related passes are satisfied.
>>>>>>>
>>>>>>> Plan for next week:
>>>>>>> 1) Perform more testing and debug any known issue
>>>>>>> 2) Fine ture the implementation so as to eliminate any unnecessary
>>>>>>> work
>>>>>>> 3) During the testing from the stats I have observed that IPRA does
>>>>>>> not always improve the work of IntraProcedural register allocators and it
>>>>>>> is also observer that the amount of benefit (in terms of spilled live
>>>>>>> ranges ) is not deterministic. So I would like to find reasons for this
>>>>>>> behavior.
>>>>>>> 4) Start implementing target specific pass for other targets if
>>>>>>> review passes properly with no major bugs.
>>>>>>>
>>>>>>> Please provide any feedback/suggestion including for format of this
>>>>>>> email.
>>>>>>>
>>>>>>> I would also like to thanks my mentors Mehdi Amini , Hal Finkel, Quentin
>>>>>>> Colombet, Matthias Braun and other community members for providing
>>>>>>> quick help every time when I asked ( I have got replies even after 8 PM (
>>>>>>> PDT) ! ) .
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Vivek
>>>>>>>
>>>>>>
>>>>>>
>>>>> _______________________________________________
>>>>> LLVM Developers mailing list
>>>>> llvm-dev at lists.llvm.org
>>>>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20160621/40b7e952/attachment-0001.html>