[ATOM] Memory form of call optimization

Tue Mar 26 10:55:24 PDT 2013

Hi Michael,
Thanks for the suggestion. I applied the patch and found that it solves the issue with CALL32m. However, for CALL64m it doesn’t seem to work, as in the codegen hangs. I am trying to figure out the cause of the problem, but I just wanted to let you know of it.  Here is the test case, which  when compiled with -O2 -march =atom hangs

extern void (**p)(int);
int main()
{
    void (*foo)(int) = *p;
    foo(2);
    return 0;
}

Thanks
Sriram

-----Original Message-----
From: Liao, Michael 
Sent: Monday, March 25, 2013 6:46 PM
To: Evan Cheng
Cc: Murali, Sriram; llvm-commits at cs.uiuc.edu
Subject: Re: [ATOM] Memory form of call optimization

This patch could be simplified by directly not selecting CALL32m or CALL64m by the following patch by removing the unnecessary copy-from-reg/copy-to-reg sequence.

Yours
- Michael

--- a/lib/Target/X86/X86InstrControl.td
+++ b/lib/Target/X86/X86InstrControl.td
@@ -158,7 +158,7 @@ let isCall = 1 in
                          Requires<[In32BitMode]>;
     def CALL32m     : I<0xFF, MRM2m, (outs), (ins i32mem:$dst),
                         "call{l}\t{*}$dst", [(X86call (loadi32 addr:
$dst))], IIC_CALL_MEM>,
-                        Requires<[In32BitMode]>;
+                        Requires<[In32BitMode,FavorIndirectMemCall]>;
 
     def FARCALL16i  : Iseg16<0x9A, RawFrmImm16, (outs),
                              (ins i16imm:$off, i16imm:$seg), @@ -231,7 +231,7 @@ let isCall = 1, Uses = [RSP] in {
   def CALL64m       : I<0xFF, MRM2m, (outs), (ins i64mem:$dst),
                         "call{q}\t{*}$dst", [(X86call (loadi64 addr:
$dst))],
                         IIC_CALL_MEM>,
-                      Requires<[In64BitMode]>;
+                      Requires<[In64BitMode,FavorIndirectMemCall]>;
 
   def FARCALL64   : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
                        "lcall{q}\t{*}$dst", [], IIC_CALL_FAR_MEM>; diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td index 39165e2..840fac4 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -626,6 +626,7 @@ def OptForSize   : Predicate<"OptForSize">;
 def OptForSpeed  : Predicate<"!OptForSize">;
 def FastBTMem    : Predicate<"!Subtarget->isBTMemSlow()">;
 def CallImmAddr  :
Predicate<"Subtarget->IsLegalToCallImmediateAddr(TM)">;
+def FavorIndirectMemCall : Predicate<"!Subtarget->callRegIndirect()">;
 
 //===----------------------------------------------------------------------===//
 // X86 Instruction Format Definitions.



On Mon, 2013-03-25 at 15:38 -0700, Evan Cheng wrote:
> LGTM
> 
> 
> Evan
> On Mar 25, 2013, at 3:22 PM, "Murali, Sriram"
> <sriram.murali at intel.com> wrote:
> 
> > Hi Evan,
> > Good catch.
> > I initially added that check specific to 32 bit code, to address an 
> > anomaly that I found. But it was wrong, and I reverted the check. I 
> > sent the incorrect patch by mistake. Here is the correct version of 
> > the patch. Also added a one more check in the lit test.
> >  
> > Sorry for the confusion.
> >  
> > Please have a look at it J
> >  
> > Thanks
> > Sriram
> >  
> >  
> > From: Evan Cheng [mailto:evan.cheng at apple.com]
> > Sent: Monday, March 25, 2013 5:31 PM
> > To: Murali, Sriram
> > Cc: llvm-commits at cs.uiuc.edu
> > Subject: Re: [ATOM] Memory form of call optimization
> >  
> > I'm a bit confused by the comment:
> > +  // Do the optimization only if the Subtarget is 64 bit where,
> >  
> > However, I see you have added tests which checks for 32-bit code 
> > sequence. Does the patch impact 32-bit?
> >  
> > Evan
> >  
> > On Mar 25, 2013, at 12:17 PM, "Murali, Sriram"
> > <sriram.murali at intel.com> wrote:
> > 
> > 
> > 
> > Hi, I am attaching the original patch, as well as an additional 
> > patch to address the generation of memory forms of call when there 
> > is a “folded reload” by the way register allocator handles unspill 
> > of the spilled registers to the stack. The second patch is bigger 
> > because of the lit tests added, while the actual modification to the 
> > llvm source is very small. The lit test is huge, because it is hard 
> > to create a scenario with spilling and unspilling of registers.
> >  
> > Please review.
> > 
> > Thanks
> > Sriram
> >  
> > From: Murali, Sriram
> > Sent: Tuesday, March 19, 2013 6:40 PM
> > To: llvm-commits at cs.uiuc.edu
> > Subject: [ATOM] Memory form of call optimization
> >  
> > Hi,
> >  
> > Memory form of call is micro-coded on Atom architecture. We can 
> > avoid this by loading the function pointer prior to the call. Memory 
> > form of call is identified in LLVM by the sequence of instructions 
> > chained to the call that obtains the function pointer. Memory forms 
> > of call are produced by a sequence of two loads in 32-bit mode or a 
> > single load in 64-bit mode preceding the call instruction . We would 
> > like to commit this work and add additional sequences later.
> >  
> > Please review.
> >  
> > Thanks,
> > Sriram
> >  
> > --
> > Sriram Murali
> > SSG/DPD/ECDL/DMP
> > +1 (519) 772 – 2579
> >  
> > <CALL_REG_INDIRECT.patch><CALL_REG_INDIRECT_FOLDED_RELOAD.patch>____
> > ___________________________________________
> > llvm-commits mailing list
> > llvm-commits at cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
> >  
> > <CALL_REG_INDIRECT.patch>
> 
> 
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits