[ATOM] Memory form of call optimization

Thu Mar 28 14:46:46 PDT 2013

Hi Sriram

The attached patch fix this crash. The crash is caused by an
optimization in X86 DAG pre-processing. It moves a load of a call
address near to a call so that DAG selection could fold them into a
memory form call insn. But, if we disabled that insn selection, it
leaves a non-glued node inside a glued sequence which breaks SD
schedule. That optimization should be disabled if target favors register
indirect call.

Yours
- Michael

On Tue, 2013-03-26 at 11:13 -0700, Murali, Sriram wrote:
> This is one of the problems I faced while implementing the optimization, where the SelectionDAG is ordered incorrectly, and resulted in an assertion at the Topological Sorting pass. I am attaching the stack dump and the SelectionDAG for your reference.
> llc: /home/smurali/dmp/projects/icicle_lake/llvm/lib/CodeGen/ScheduleDAG.cpp:511: void llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting(): Assertion `Node2Index[SU->NodeNum] > Node2Index[I->getSUnit()->NodeNum] && "Wrong topological sorting"' failed.
> 0  llc             0x00000000015dce36 llvm::sys::PrintStackTrace(_IO_FILE*) + 38
> 1  llc             0x00000000015dd0bd
> 2  llc             0x00000000015dcb0c
> 3  libpthread.so.0 0x00007f0724186cb0
> 4  libc.so.6       0x00007f07235dc425 gsignal + 53
> 5  libc.so.6       0x00007f07235dfb8b abort + 379
> 6  libc.so.6       0x00007f07235d50ee
> 7  libc.so.6       0x00007f07235d5192
> 8  llc             0x000000000119b213 llvm::ScheduleDAGTopologicalSort::InitDAGTopologicalSorting() + 913
> 9  llc             0x0000000000e6cf8e
> 10 llc             0x0000000000e7aec1 llvm::ScheduleDAGSDNodes::Run(llvm::SelectionDAG*, llvm::MachineBasicBlock*) + 103
> 11 llc             0x0000000000f11822 llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 3918
> 12 llc             0x0000000000f104f0 llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::Instruction const>, llvm::ilist_iterator<llvm::Instruction const>, bool&) + 260
> 13 llc             0x0000000000f133c2 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 2900
> 14 llc             0x0000000000f0f89e llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 1054
> 15 llc             0x00000000010e6505 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 95
> 16 llc             0x00000000014dd572 llvm::FPPassManager::runOnFunction(llvm::Function&) + 390
> 17 llc             0x00000000014dd76d llvm::FPPassManager::runOnModule(llvm::Module&) + 89
> 18 llc             0x00000000014ddae5 llvm::MPPassManager::runOnModule(llvm::Module&) + 573
> 19 llc             0x00000000014de0fa llvm::PassManagerImpl::run(llvm::Module&) + 254
> 20 llc             0x00000000014de30d llvm::PassManager::run(llvm::Module&) + 39
> 21 llc             0x0000000000858183
> 22 llc             0x0000000000857134 main + 237
> 23 libc.so.6       0x00007f07235c776d __libc_start_main + 237
> 24 llc             0x0000000000856a99
> 
> -----Original Message-----
> From: llvm-commits-bounces at cs.uiuc.edu [mailto:llvm-commits-bounces at cs.uiuc.edu] On Behalf Of Murali, Sriram
> Sent: Tuesday, March 26, 2013 1:55 PM
> To: Liao, Michael
> Cc: llvm-commits at cs.uiuc.edu
> Subject: RE: [ATOM] Memory form of call optimization
> 
> Hi Michael,
> Thanks for the suggestion. I applied the patch and found that it solves the issue with CALL32m. However, for CALL64m it doesn’t seem to work, as in the codegen hangs. I am trying to figure out the cause of the problem, but I just wanted to let you know of it.  Here is the test case, which  when compiled with -O2 -march =atom hangs
> 
> extern void (**p)(int);
> int main()
> {
>     void (*foo)(int) = *p;
>     foo(2);
>     return 0;
> }
> 
> Thanks
> Sriram
> 
> -----Original Message-----
> From: Liao, Michael
> Sent: Monday, March 25, 2013 6:46 PM
> To: Evan Cheng
> Cc: Murali, Sriram; llvm-commits at cs.uiuc.edu
> Subject: Re: [ATOM] Memory form of call optimization
> 
> This patch could be simplified by directly not selecting CALL32m or CALL64m by the following patch by removing the unnecessary copy-from-reg/copy-to-reg sequence.
> 
> Yours
> - Michael
> 
> --- a/lib/Target/X86/X86InstrControl.td
> +++ b/lib/Target/X86/X86InstrControl.td
> @@ -158,7 +158,7 @@ let isCall = 1 in
>                           Requires<[In32BitMode]>;
>      def CALL32m     : I<0xFF, MRM2m, (outs), (ins i32mem:$dst),
>                          "call{l}\t{*}$dst", [(X86call (loadi32 addr:
> $dst))], IIC_CALL_MEM>,
> -                        Requires<[In32BitMode]>;
> +                        Requires<[In32BitMode,FavorIndirectMemCall]>;
>  
>      def FARCALL16i  : Iseg16<0x9A, RawFrmImm16, (outs),
>                               (ins i16imm:$off, i16imm:$seg), @@ -231,7 +231,7 @@ let isCall = 1, Uses = [RSP] in {
>    def CALL64m       : I<0xFF, MRM2m, (outs), (ins i64mem:$dst),
>                          "call{q}\t{*}$dst", [(X86call (loadi64 addr:
> $dst))],
>                          IIC_CALL_MEM>,
> -                      Requires<[In64BitMode]>;
> +                      Requires<[In64BitMode,FavorIndirectMemCall]>;
>  
>    def FARCALL64   : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
>                         "lcall{q}\t{*}$dst", [], IIC_CALL_FAR_MEM>; diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td index 39165e2..840fac4 100644
> --- a/lib/Target/X86/X86InstrInfo.td
> +++ b/lib/Target/X86/X86InstrInfo.td
> @@ -626,6 +626,7 @@ def OptForSize   : Predicate<"OptForSize">;
>  def OptForSpeed  : Predicate<"!OptForSize">;
>  def FastBTMem    : Predicate<"!Subtarget->isBTMemSlow()">;
>  def CallImmAddr  :
> Predicate<"Subtarget->IsLegalToCallImmediateAddr(TM)">;
> +def FavorIndirectMemCall : Predicate<"!Subtarget->callRegIndirect()">;
>  
>  //===----------------------------------------------------------------------===//
>  // X86 Instruction Format Definitions.
> 
> 
> 
> On Mon, 2013-03-25 at 15:38 -0700, Evan Cheng wrote:
> > LGTM
> > 
> > 
> > Evan
> > On Mar 25, 2013, at 3:22 PM, "Murali, Sriram"
> > <sriram.murali at intel.com> wrote:
> > 
> > > Hi Evan,
> > > Good catch.
> > > I initially added that check specific to 32 bit code, to address an 
> > > anomaly that I found. But it was wrong, and I reverted the check. I 
> > > sent the incorrect patch by mistake. Here is the correct version of 
> > > the patch. Also added a one more check in the lit test.
> > >  
> > > Sorry for the confusion.
> > >  
> > > Please have a look at it J
> > >  
> > > Thanks
> > > Sriram
> > >  
> > >  
> > > From: Evan Cheng [mailto:evan.cheng at apple.com]
> > > Sent: Monday, March 25, 2013 5:31 PM
> > > To: Murali, Sriram
> > > Cc: llvm-commits at cs.uiuc.edu
> > > Subject: Re: [ATOM] Memory form of call optimization
> > >  
> > > I'm a bit confused by the comment:
> > > +  // Do the optimization only if the Subtarget is 64 bit where,
> > >  
> > > However, I see you have added tests which checks for 32-bit code 
> > > sequence. Does the patch impact 32-bit?
> > >  
> > > Evan
> > >  
> > > On Mar 25, 2013, at 12:17 PM, "Murali, Sriram"
> > > <sriram.murali at intel.com> wrote:
> > > 
> > > 
> > > 
> > > Hi, I am attaching the original patch, as well as an additional 
> > > patch to address the generation of memory forms of call when there 
> > > is a “folded reload” by the way register allocator handles unspill 
> > > of the spilled registers to the stack. The second patch is bigger 
> > > because of the lit tests added, while the actual modification to the 
> > > llvm source is very small. The lit test is huge, because it is hard 
> > > to create a scenario with spilling and unspilling of registers.
> > >  
> > > Please review.
> > > 
> > > Thanks
> > > Sriram
> > >  
> > > From: Murali, Sriram
> > > Sent: Tuesday, March 19, 2013 6:40 PM
> > > To: llvm-commits at cs.uiuc.edu
> > > Subject: [ATOM] Memory form of call optimization
> > >  
> > > Hi,
> > >  
> > > Memory form of call is micro-coded on Atom architecture. We can 
> > > avoid this by loading the function pointer prior to the call. Memory 
> > > form of call is identified in LLVM by the sequence of instructions 
> > > chained to the call that obtains the function pointer. Memory forms 
> > > of call are produced by a sequence of two loads in 32-bit mode or a 
> > > single load in 64-bit mode preceding the call instruction . We would 
> > > like to commit this work and add additional sequences later.
> > >  
> > > Please review.
> > >  
> > > Thanks,
> > > Sriram
> > >  
> > > --
> > > Sriram Murali
> > > SSG/DPD/ECDL/DMP
> > > +1 (519) 772 – 2579
> > >  
> > > <CALL_REG_INDIRECT.patch><CALL_REG_INDIRECT_FOLDED_RELOAD.patch>____
> > > ___________________________________________
> > > llvm-commits mailing list
> > > llvm-commits at cs.uiuc.edu
> > > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> > >  
> > > <CALL_REG_INDIRECT.patch>
> > 
> > 
> > _______________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> 
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits

-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Skip-moving-call-address-loading-into-callseq-when-t.patch
Type: text/x-patch
Size: 4597 bytes
Desc: not available
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20130328/db6882e9/attachment.bin>