[llvm-commits] [Review request] Tweaking Win64 Codegen

Mon Jan 10 23:44:05 PST 2011

Hello, guys!

I can build Win64 clang (by w64-clang selfhost) with my local patches.
For my attempt, I found some issues on llvm;

  - stack allocation
    - shadow area
    - allocating stack beyond page boundary
    - allocating variable alloca(n)
  - varargs (caller and callee)
  - tailcall

Essential patches are here;
They just work, though, I am dubious of their style.
Please please let me know any feedbacks.

Next, I will propose w64-clang patches and 57 of CodeGen/X86 tests.


Thank you, ...Takumi


* 0001-Target-X86-Fix-whitespace.patch.txt
* 0002-test-CodeGen-X86-Fix-whitespace.patch.txt

  Cosmetic changes.


* 0003-lib-Target-X86-X86ISelLowering.cpp-Introduce-a-n.patch.txt

  No functional changes. for 0004.
  It introduces a new variable "IsWin64" instead of Subtarget->isWin64().


* 0004-Target-X86-Tweak-allocating-shadow-area-aka-home.patch.txt

  Let caller provide at least 4 x i64 allocation for callee(s).
  On leaf functions, shadow area is not allocated, to save stack usage.

  FIXME: I wonder if CCState had knowledge for minimum stack allocation.

  It resolves also PR8922, though, emitted code would not be optimal.


* 0005-Target-X86-Tweak-alloca-and-add-a-testcase-for-m.patch.txt

  [PR8777][PR8778][PR8919]
  Introduce W64ALLOCA and emit one for w64.

  It reverts PR8919 and r122934.
  I assume mingw64 distro and FSF sources.


* 0006-Target-X86-Tweak-va_arg-for-Win64-not-to-miss-ta.patch.txt

  When fixed args > 4, va_ptr(ap) would be missed.
  It just works but must be dirty, I think.


* 0007-X86FrameInfo.cpp-X86RegisterInfo.cpp-Re-indent.-.patch.txt

  Cosmetic changes to apply 0009 easily. No functional changes.


* 0008-TableGen-EDEmitter.cpp-Add-TCW64.patch.txt

  Let tablegen recognize new class GR_TCW64.


* 0009-Target-X86-Tweak-win64-s-tailcall.patch.txt

  [PR8743] Introduce tailcall-w64 stuff.

  FIXME: eligibility of w64's tailcall might be loosen.
-------------- next part --------------
From 949c925274e0e8a40238553c638335fd330c0c96 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Thu, 9 Dec 2010 20:15:03 +0900
Subject: [PATCH 1/9] Target/X86: Fix whitespace.

---
 lib/Target/X86/X86FrameLowering.cpp |    2 +-
 lib/Target/X86/X86ISelLowering.cpp  |   94 +++++++++++++++++-----------------
 lib/Target/X86/X86InstrCompiler.td  |   17 +++---
 lib/Target/X86/X86InstrControl.td   |   39 +++++++-------
 lib/Target/X86/X86InstrInfo.cpp     |   84 +++++++++++++++---------------
 lib/Target/X86/X86InstrInfo.td      |    3 +-
 lib/Target/X86/X86MCInstLower.cpp   |   65 ++++++++++++------------
 lib/Target/X86/X86RegisterInfo.cpp  |    6 +-
 lib/Target/X86/X86RegisterInfo.td   |   24 +++++-----
 9 files changed, 165 insertions(+), 169 deletions(-)

diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index 7c7b4f3..cbf1b59 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -321,7 +321,7 @@ void X86FrameLowering::emitCalleeSavedFrameMoves(MachineFunction &MF,
     // move" for this extra "PUSH", the linker will lose track of the fact that
     // the frame pointer should have the value of the first "PUSH" when it's
     // trying to unwind.
-    // 
+    //
     // FIXME: This looks inelegant. It's possibly correct, but it's covering up
     //        another bug. I.e., one where we generate a prolog like this:
     //
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index 1a4bb97..2268823 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -69,7 +69,7 @@ static TargetLoweringObjectFile *createTLOF(X86TargetMachine &TM) {
       return new X8664_MachoTargetObjectFile();
     return new TargetLoweringObjectFileMachO();
   }
-  
+
   if (TM.getSubtarget<X86Subtarget>().isTargetELF() ){
     if (is64Bit)
       return new X8664_ELFTargetObjectFile(TM);
@@ -256,7 +256,7 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
     setOperationAction(ISD::UDIV, VT, Expand);
     setOperationAction(ISD::SREM, VT, Expand);
     setOperationAction(ISD::UREM, VT, Expand);
-    
+
     // Add/Sub overflow ops with MVT::Glues are lowered to EFLAGS dependences.
     setOperationAction(ISD::ADDC, VT, Custom);
     setOperationAction(ISD::ADDE, VT, Custom);
@@ -369,7 +369,7 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
     setOperationAction(ISD::ATOMIC_CMP_SWAP, VT, Custom);
     setOperationAction(ISD::ATOMIC_LOAD_SUB, VT, Custom);
   }
-    
+
   if (!Subtarget->is64Bit()) {
     setOperationAction(ISD::ATOMIC_LOAD_ADD, MVT::i64, Custom);
     setOperationAction(ISD::ATOMIC_LOAD_SUB, MVT::i64, Custom);
@@ -931,7 +931,7 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
   // We want to custom lower some of our intrinsics.
   setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
 
-    
+
   // Only custom-lower 64-bit SADDO and friends on 64-bit because we don't
   // handle type legalization for these operations here.
   //
@@ -948,7 +948,7 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
     setOperationAction(ISD::SMULO, VT, Custom);
     setOperationAction(ISD::UMULO, VT, Custom);
   }
-    
+
   // There are no 8-bit 3-address imul/mul instructions
   setOperationAction(ISD::SMULO, MVT::i8, Expand);
   setOperationAction(ISD::UMULO, MVT::i8, Expand);
@@ -6198,7 +6198,7 @@ X86TargetLowering::LowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const {
     // TLSCALL will be codegen'ed as call. Inform MFI that function has calls.
     MachineFrameInfo *MFI = DAG.getMachineFunction().getFrameInfo();
     MFI->setAdjustsStack(true);
-    
+
     // And our return value (tls address) is in the standard call return value
     // location.
     unsigned Reg = Subtarget->is64Bit() ? X86::RAX : X86::EAX;
@@ -7047,7 +7047,7 @@ SDValue X86TargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const {
       (cast<ConstantSDNode>(Op1)->getZExtValue() == 1 ||
        cast<ConstantSDNode>(Op1)->isNullValue()) &&
       (CC == ISD::SETEQ || CC == ISD::SETNE)) {
- 
+
     // If the input is a setcc, then reuse the input setcc or use a new one with
     // the inverted condition.
     if (Op0.getOpcode() == X86ISD::SETCC) {
@@ -7055,7 +7055,7 @@ SDValue X86TargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const {
       bool Invert = (CC == ISD::SETNE) ^
         cast<ConstantSDNode>(Op1)->isNullValue();
       if (!Invert) return Op0;
-      
+
       CCode = X86::GetOppositeBranchCondition(CCode);
       return DAG.getNode(X86ISD::SETCC, dl, MVT::i8,
                          DAG.getConstant(CCode, MVT::i8), Op0.getOperand(1));
@@ -7206,7 +7206,7 @@ static bool isX86LogicalCmp(SDValue Op) {
 
   if (Op.getResNo() == 2 && Opc == X86ISD::UMUL)
     return true;
-    
+
   return false;
 }
 
@@ -7242,24 +7242,24 @@ SDValue X86TargetLowering::LowerSELECT(SDValue Op, SelectionDAG &DAG) const {
       Cond.getOperand(1).getOpcode() == X86ISD::CMP &&
       isZero(Cond.getOperand(1).getOperand(1))) {
     SDValue Cmp = Cond.getOperand(1);
-    
+
     unsigned CondCode =cast<ConstantSDNode>(Cond.getOperand(0))->getZExtValue();
-    
-    if ((isAllOnes(Op1) || isAllOnes(Op2)) && 
+
+    if ((isAllOnes(Op1) || isAllOnes(Op2)) &&
         (CondCode == X86::COND_E || CondCode == X86::COND_NE)) {
       SDValue Y = isAllOnes(Op2) ? Op1 : Op2;
 
       SDValue CmpOp0 = Cmp.getOperand(0);
       Cmp = DAG.getNode(X86ISD::CMP, DL, MVT::i32,
                         CmpOp0, DAG.getConstant(1, CmpOp0.getValueType()));
-      
+
       SDValue Res =   // Res = 0 or -1.
         DAG.getNode(X86ISD::SETCC_CARRY, DL, Op.getValueType(),
                     DAG.getConstant(X86::COND_B, MVT::i8), Cmp);
-      
+
       if (isAllOnes(Op1) != (CondCode == X86::COND_E))
         Res = DAG.getNOT(DL, Res, Res.getValueType());
-      
+
       ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(Op2);
       if (N2C == 0 || !N2C->isNullValue())
         Res = DAG.getNode(ISD::OR, DL, Res.getValueType(), Res, Y);
@@ -8443,7 +8443,7 @@ SDValue X86TargetLowering::LowerSHL(SDValue Op, SelectionDAG &DAG) const {
     Op = DAG.getNode(ISD::ADD, dl, VT, Op, Op);
 
     // return pblendv(r, r+r, a);
-    R = DAG.getNode(X86ISD::PBLENDVB, dl, VT, 
+    R = DAG.getNode(X86ISD::PBLENDVB, dl, VT,
                     R, DAG.getNode(ISD::ADD, dl, VT, R, R), Op);
     return R;
   }
@@ -8503,12 +8503,12 @@ SDValue X86TargetLowering::LowerXALUO(SDValue Op, SelectionDAG &DAG) const {
     SDVTList VTs = DAG.getVTList(N->getValueType(0), N->getValueType(0),
                                  MVT::i32);
     SDValue Sum = DAG.getNode(X86ISD::UMUL, DL, VTs, LHS, RHS);
-    
+
     SDValue SetCC =
       DAG.getNode(X86ISD::SETCC, DL, MVT::i8,
                   DAG.getConstant(X86::COND_O, MVT::i32),
                   SDValue(Sum.getNode(), 2));
-    
+
     DAG.ReplaceAllUsesOfValueWith(SDValue(N, 1), SetCC);
     return Sum;
   }
@@ -8663,9 +8663,9 @@ static SDValue LowerADDC_ADDE_SUBC_SUBE(SDValue Op, SelectionDAG &DAG) {
   // Let legalize expand this if it isn't a legal type yet.
   if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
     return SDValue();
-  
+
   SDVTList VTs = DAG.getVTList(VT, MVT::i32);
-  
+
   unsigned Opc;
   bool ExtraOp = false;
   switch (Op.getOpcode()) {
@@ -8675,7 +8675,7 @@ static SDValue LowerADDC_ADDE_SUBC_SUBE(SDValue Op, SelectionDAG &DAG) {
   case ISD::SUBC: Opc = X86ISD::SUB; break;
   case ISD::SUBE: Opc = X86ISD::SBB; ExtraOp = true; break;
   }
-  
+
   if (!ExtraOp)
     return DAG.getNode(Opc, Op->getDebugLoc(), VTs, Op.getOperand(0),
                        Op.getOperand(1));
@@ -9555,14 +9555,14 @@ MachineBasicBlock *
 X86TargetLowering::EmitMonitor(MachineInstr *MI, MachineBasicBlock *BB) const {
   DebugLoc dl = MI->getDebugLoc();
   const TargetInstrInfo *TII = getTargetMachine().getInstrInfo();
-  
+
   // Address into RAX/EAX, other two args into ECX, EDX.
   unsigned MemOpc = Subtarget->is64Bit() ? X86::LEA64r : X86::LEA32r;
   unsigned MemReg = Subtarget->is64Bit() ? X86::RAX : X86::EAX;
   MachineInstrBuilder MIB = BuildMI(*BB, MI, dl, TII->get(MemOpc), MemReg);
   for (int i = 0; i < X86::AddrNumOperands; ++i)
     MIB.addOperand(MI->getOperand(i));
-  
+
   unsigned ValOps = X86::AddrNumOperands;
   BuildMI(*BB, MI, dl, TII->get(TargetOpcode::COPY), X86::ECX)
     .addReg(MI->getOperand(ValOps).getReg());
@@ -9571,7 +9571,7 @@ X86TargetLowering::EmitMonitor(MachineInstr *MI, MachineBasicBlock *BB) const {
 
   // The instruction doesn't actually take any operands though.
   BuildMI(*BB, MI, dl, TII->get(X86::MONITORrrr));
-  
+
   MI->eraseFromParent(); // The pseudo is gone now.
   return BB;
 }
@@ -9580,16 +9580,16 @@ MachineBasicBlock *
 X86TargetLowering::EmitMwait(MachineInstr *MI, MachineBasicBlock *BB) const {
   DebugLoc dl = MI->getDebugLoc();
   const TargetInstrInfo *TII = getTargetMachine().getInstrInfo();
-  
+
   // First arg in ECX, the second in EAX.
   BuildMI(*BB, MI, dl, TII->get(TargetOpcode::COPY), X86::ECX)
     .addReg(MI->getOperand(0).getReg());
   BuildMI(*BB, MI, dl, TII->get(TargetOpcode::COPY), X86::EAX)
     .addReg(MI->getOperand(1).getReg());
-    
+
   // The instruction doesn't actually take any operands though.
   BuildMI(*BB, MI, dl, TII->get(X86::MWAITrr));
-  
+
   MI->eraseFromParent(); // The pseudo is gone now.
   return BB;
 }
@@ -10195,7 +10195,7 @@ X86TargetLowering::EmitInstrWithCustomInserter(MachineInstr *MI,
 
     // Thread synchronization.
   case X86::MONITOR:
-    return EmitMonitor(MI, BB);  
+    return EmitMonitor(MI, BB);
   case X86::MWAIT:
     return EmitMwait(MI, BB);
 
@@ -11116,19 +11116,19 @@ static SDValue PerformAndCombine(SDNode *N, SelectionDAG &DAG,
                                  const X86Subtarget *Subtarget) {
   if (DCI.isBeforeLegalizeOps())
     return SDValue();
-  
+
   // Want to form PANDN nodes, in the hopes of then easily combining them with
   // OR and AND nodes to form PBLEND/PSIGN.
   EVT VT = N->getValueType(0);
   if (VT != MVT::v2i64)
     return SDValue();
-  
+
   SDValue N0 = N->getOperand(0);
   SDValue N1 = N->getOperand(1);
   DebugLoc DL = N->getDebugLoc();
-  
+
   // Check LHS for vnot
-  if (N0.getOpcode() == ISD::XOR && 
+  if (N0.getOpcode() == ISD::XOR &&
       ISD::isBuildVectorAllOnes(N0.getOperand(1).getNode()))
     return DAG.getNode(X86ISD::PANDN, DL, VT, N0.getOperand(0), N1);
 
@@ -11136,7 +11136,7 @@ static SDValue PerformAndCombine(SDNode *N, SelectionDAG &DAG,
   if (N1.getOpcode() == ISD::XOR &&
       ISD::isBuildVectorAllOnes(N1.getOperand(1).getNode()))
     return DAG.getNode(X86ISD::PANDN, DL, VT, N1.getOperand(0), N0);
-  
+
   return SDValue();
 }
 
@@ -11152,7 +11152,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
 
   SDValue N0 = N->getOperand(0);
   SDValue N1 = N->getOperand(1);
-  
+
   // look for psign/blend
   if (Subtarget->hasSSSE3()) {
     if (VT == MVT::v2i64) {
@@ -11168,17 +11168,17 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
           Y = N0.getOperand(1);
         if (N0.getOperand(1) == Mask)
           Y = N0.getOperand(0);
-        
+
         // Check to see if the mask appeared in both the AND and PANDN and
         if (!Y.getNode())
           return SDValue();
-        
+
         // Validate that X, Y, and Mask are BIT_CONVERTS, and see through them.
         if (Mask.getOpcode() != ISD::BITCAST ||
             X.getOpcode() != ISD::BITCAST ||
             Y.getOpcode() != ISD::BITCAST)
           return SDValue();
-        
+
         // Look through mask bitcast.
         Mask = Mask.getOperand(0);
         EVT MaskVT = Mask.getValueType();
@@ -11187,7 +11187,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
         // will be an intrinsic.
         if (Mask.getOpcode() != ISD::INTRINSIC_WO_CHAIN)
           return SDValue();
-        
+
         // FIXME: what to do for bytes, since there is a psignb/pblendvb, but
         // there is no psrai.b
         switch (cast<ConstantSDNode>(Mask.getOperand(0))->getZExtValue()) {
@@ -11196,14 +11196,14 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
           break;
         default: return SDValue();
         }
-        
+
         // Check that the SRA is all signbits.
         SDValue SraC = Mask.getOperand(2);
         unsigned SraAmt  = cast<ConstantSDNode>(SraC)->getZExtValue();
         unsigned EltBits = MaskVT.getVectorElementType().getSizeInBits();
         if ((SraAmt + 1) != EltBits)
           return SDValue();
-        
+
         DebugLoc DL = N->getDebugLoc();
 
         // Now we know we at least have a plendvb with the mask val.  See if
@@ -11229,7 +11229,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
         // PBLENDVB only available on SSE 4.1
         if (!Subtarget->hasSSE41())
           return SDValue();
-        
+
         X = DAG.getNode(ISD::BITCAST, DL, MVT::v16i8, X);
         Y = DAG.getNode(ISD::BITCAST, DL, MVT::v16i8, Y);
         Mask = DAG.getNode(ISD::BITCAST, DL, MVT::v16i8, Mask);
@@ -11238,7 +11238,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
       }
     }
   }
-  
+
   // fold (or (x << c) | (y >> (64 - c))) ==> (shld64 x, y, c)
   if (N0.getOpcode() == ISD::SRL && N1.getOpcode() == ISD::SHL)
     std::swap(N0, N1);
@@ -11290,7 +11290,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
                          DAG.getNode(ISD::TRUNCATE, DL,
                                        MVT::i8, ShAmt0));
   }
-  
+
   return SDValue();
 }
 
@@ -11500,7 +11500,7 @@ static SDValue PerformSETCCCombine(SDNode *N, SelectionDAG &DAG) {
   unsigned X86CC = N->getConstantOperandVal(0);
   SDValue EFLAG = N->getOperand(1);
   DebugLoc DL = N->getDebugLoc();
-  
+
   // Materialize "setb reg" as "sbb reg,reg", since it can be extended without
   // a zext and produces an all-ones bit which is more useful than 0/1 in some
   // cases.
@@ -11509,10 +11509,10 @@ static SDValue PerformSETCCCombine(SDNode *N, SelectionDAG &DAG) {
                        DAG.getNode(X86ISD::SETCC_CARRY, DL, MVT::i8,
                                    DAG.getConstant(X86CC, MVT::i8), EFLAG),
                        DAG.getConstant(1, MVT::i8));
-  
+
   return SDValue();
 }
-          
+
 // Optimize RES, EFLAGS = X86ISD::ADC LHS, RHS, EFLAGS
 static SDValue PerformADCCombine(SDNode *N, SelectionDAG &DAG,
                                  X86TargetLowering::DAGCombinerInfo &DCI) {
@@ -11544,7 +11544,7 @@ static SDValue PerformADCCombine(SDNode *N, SelectionDAG &DAG,
 //      (sub (setne X, 0), Y) -> adc -1, Y
 static SDValue OptimizeConditonalInDecrement(SDNode *N, SelectionDAG &DAG) {
   DebugLoc DL = N->getDebugLoc();
-  
+
   // Look through ZExts.
   SDValue Ext = N->getOperand(N->getOpcode() == ISD::SUB ? 1 : 0);
   if (Ext.getOpcode() != ISD::ZERO_EXTEND || !Ext.hasOneUse())
diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td
index da5e05a..d2c5763 100644
--- a/lib/Target/X86/X86InstrCompiler.td
+++ b/lib/Target/X86/X86InstrCompiler.td
@@ -849,38 +849,38 @@ def : Pat<(X86call (i64 texternalsym:$dst)),
 // tailcall stuff
 def : Pat<(X86tcret GR32_TC:$dst, imm:$off),
           (TCRETURNri GR32_TC:$dst, imm:$off)>,
-	  Requires<[In32BitMode]>;
+          Requires<[In32BitMode]>;
 
 // FIXME: This is disabled for 32-bit PIC mode because the global base
 // register which is part of the address mode may be assigned a
 // callee-saved register.
 def : Pat<(X86tcret (load addr:$dst), imm:$off),
           (TCRETURNmi addr:$dst, imm:$off)>,
-	  Requires<[In32BitMode, IsNotPIC]>;
+          Requires<[In32BitMode, IsNotPIC]>;
 
 def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),
           (TCRETURNdi texternalsym:$dst, imm:$off)>,
-	  Requires<[In32BitMode]>;
+          Requires<[In32BitMode]>;
 
 def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),
           (TCRETURNdi texternalsym:$dst, imm:$off)>,
-	  Requires<[In32BitMode]>;
+          Requires<[In32BitMode]>;
 
 def : Pat<(X86tcret GR64_TC:$dst, imm:$off),
           (TCRETURNri64 GR64_TC:$dst, imm:$off)>,
-	  Requires<[In64BitMode]>;
+          Requires<[In64BitMode]>;
 
 def : Pat<(X86tcret (load addr:$dst), imm:$off),
           (TCRETURNmi64 addr:$dst, imm:$off)>,
-	  Requires<[In64BitMode]>;
+          Requires<[In64BitMode]>;
 
 def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
           (TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
-	  Requires<[In64BitMode]>;
+          Requires<[In64BitMode]>;
 
 def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),
           (TCRETURNdi64 texternalsym:$dst, imm:$off)>,
-	  Requires<[In64BitMode]>;
+          Requires<[In64BitMode]>;
 
 // Normal calls, with various flavors of addresses.
 def : Pat<(X86call (i32 tglobaladdr:$dst)),
@@ -1661,4 +1661,3 @@ def : Pat<(and GR64:$src1, i64immSExt8:$src2),
           (AND64ri8 GR64:$src1, i64immSExt8:$src2)>;
 def : Pat<(and GR64:$src1, i64immSExt32:$src2),
           (AND64ri32 GR64:$src1, i64immSExt32:$src2)>;
-
diff --git a/lib/Target/X86/X86InstrControl.td b/lib/Target/X86/X86InstrControl.td
index 62ab53e..4d1c5f7 100644
--- a/lib/Target/X86/X86InstrControl.td
+++ b/lib/Target/X86/X86InstrControl.td
@@ -1,10 +1,10 @@
 //===- X86InstrControl.td - Control Flow Instructions ------*- tablegen -*-===//
-// 
+//
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
-// 
+//
 //===----------------------------------------------------------------------===//
 //
 // This file describes the X86 jump, return, call, and related instructions.
@@ -43,7 +43,7 @@ let isBarrier = 1, isBranch = 1, isTerminator = 1 in {
                         "jmp\t$dst", [(br bb:$dst)]>;
   def JMP_1 : Ii8PCRel<0xEB, RawFrm, (outs), (ins brtarget8:$dst),
                        "jmp\t$dst", []>;
-  def JMP64pcrel32 : I<0xE9, RawFrm, (outs), (ins brtarget:$dst), 
+  def JMP64pcrel32 : I<0xE9, RawFrm, (outs), (ins brtarget:$dst),
                        "jmp{q}\t$dst", []>;
 }
 
@@ -108,16 +108,16 @@ let isBranch = 1, isTerminator = 1, isBarrier = 1, isIndirectBranch = 1 in {
   def JMP64m     : I<0xFF, MRM4m, (outs), (ins i64mem:$dst), "jmp{q}\t{*}$dst",
                      [(brind (loadi64 addr:$dst))]>, Requires<[In64BitMode]>;
 
-  def FARJMP16i  : Iseg16<0xEA, RawFrmImm16, (outs), 
+  def FARJMP16i  : Iseg16<0xEA, RawFrmImm16, (outs),
                           (ins i16imm:$off, i16imm:$seg),
                           "ljmp{w}\t{$seg, $off|$off, $seg}", []>, OpSize;
   def FARJMP32i  : Iseg32<0xEA, RawFrmImm16, (outs),
                           (ins i32imm:$off, i16imm:$seg),
-                          "ljmp{l}\t{$seg, $off|$off, $seg}", []>;                     
+                          "ljmp{l}\t{$seg, $off|$off, $seg}", []>;
   def FARJMP64   : RI<0xFF, MRM5m, (outs), (ins opaque80mem:$dst),
                       "ljmp{q}\t{*}$dst", []>;
 
-  def FARJMP16m  : I<0xFF, MRM5m, (outs), (ins opaque32mem:$dst), 
+  def FARJMP16m  : I<0xFF, MRM5m, (outs), (ins opaque32mem:$dst),
                      "ljmp{w}\t{*}$dst", []>, OpSize;
   def FARJMP32m  : I<0xFF, MRM5m, (outs), (ins opaque48mem:$dst),
                      "ljmp{l}\t{*}$dst", []>;
@@ -152,14 +152,14 @@ let isCall = 1 in
     def CALL32m     : I<0xFF, MRM2m, (outs), (ins i32mem:$dst, variable_ops),
                         "call{l}\t{*}$dst", [(X86call (loadi32 addr:$dst))]>,
                         Requires<[In32BitMode]>;
-  
-    def FARCALL16i  : Iseg16<0x9A, RawFrmImm16, (outs), 
+
+    def FARCALL16i  : Iseg16<0x9A, RawFrmImm16, (outs),
                              (ins i16imm:$off, i16imm:$seg),
                              "lcall{w}\t{$seg, $off|$off, $seg}", []>, OpSize;
     def FARCALL32i  : Iseg32<0x9A, RawFrmImm16, (outs),
                              (ins i32imm:$off, i16imm:$seg),
                              "lcall{l}\t{$seg, $off|$off, $seg}", []>;
-                             
+
     def FARCALL16m  : I<0xFF, MRM3m, (outs), (ins opaque32mem:$dst),
                         "lcall{w}\t{*}$dst", []>, OpSize;
     def FARCALL32m  : I<0xFF, MRM3m, (outs), (ins opaque48mem:$dst),
@@ -182,12 +182,12 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
               XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
               XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
       Uses = [ESP] in {
-  def TCRETURNdi : PseudoI<(outs), 
+  def TCRETURNdi : PseudoI<(outs),
                      (ins i32imm_pcrel:$dst, i32imm:$offset, variable_ops), []>;
-  def TCRETURNri : PseudoI<(outs), 
+  def TCRETURNri : PseudoI<(outs),
                      (ins GR32_TC:$dst, i32imm:$offset, variable_ops), []>;
   let mayLoad = 1 in
-  def TCRETURNmi : PseudoI<(outs), 
+  def TCRETURNmi : PseudoI<(outs),
                      (ins i32mem_TC:$dst, i32imm:$offset, variable_ops), []>;
 
   // FIXME: The should be pseudo instructions that are lowered when going to
@@ -196,7 +196,7 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
                            (ins i32imm_pcrel:$dst, variable_ops),
                  "jmp\t$dst  # TAILCALL",
                  []>;
-  def TAILJMPr : I<0xFF, MRM4r, (outs), (ins GR32_TC:$dst, variable_ops), 
+  def TAILJMPr : I<0xFF, MRM4r, (outs), (ins GR32_TC:$dst, variable_ops),
                    "", []>;  // FIXME: Remove encoding when JIT is dead.
   let mayLoad = 1 in
   def TAILJMPm : I<0xFF, MRM4m, (outs), (ins i32mem_TC:$dst, variable_ops),
@@ -218,7 +218,7 @@ let isCall = 1 in
               XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
               XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
       Uses = [RSP] in {
-      
+
     // NOTE: this pattern doesn't match "X86call imm", because we do not know
     // that the offset between an arbitrary immediate and the call will fit in
     // the 32-bit pcrel field that we have.
@@ -232,12 +232,12 @@ let isCall = 1 in
     def CALL64m       : I<0xFF, MRM2m, (outs), (ins i64mem:$dst, variable_ops),
                           "call{q}\t{*}$dst", [(X86call (loadi64 addr:$dst))]>,
                         Requires<[In64BitMode, NotWin64]>;
-                        
+
     def FARCALL64   : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
                          "lcall{q}\t{*}$dst", []>;
   }
 
-  // FIXME: We need to teach codegen about single list of call-clobbered 
+  // FIXME: We need to teach codegen about single list of call-clobbered
   // registers.
 let isCall = 1, isCodeGenOnly = 1 in
   // All calls clobber the non-callee saved registers. RSP is marked as
@@ -256,10 +256,10 @@ let isCall = 1, isCodeGenOnly = 1 in
     def WINCALL64r       : I<0xFF, MRM2r, (outs), (ins GR64:$dst, variable_ops),
                              "call{q}\t{*}$dst",
                              [(X86call GR64:$dst)]>, Requires<[IsWin64]>;
-    def WINCALL64m       : I<0xFF, MRM2m, (outs), 
+    def WINCALL64m       : I<0xFF, MRM2m, (outs),
                               (ins i64mem:$dst,variable_ops),
                              "call{q}\t{*}$dst",
-                             [(X86call (loadi64 addr:$dst))]>, 
+                             [(X86call (loadi64 addr:$dst))]>,
                            Requires<[IsWin64]>;
   }
 
@@ -278,7 +278,7 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
   def TCRETURNri64 : PseudoI<(outs),
                       (ins GR64_TC:$dst, i32imm:$offset, variable_ops), []>;
   let mayLoad = 1 in
-  def TCRETURNmi64 : PseudoI<(outs), 
+  def TCRETURNmi64 : PseudoI<(outs),
                        (ins i64mem_TC:$dst, i32imm:$offset, variable_ops), []>;
 
   def TAILJMPd64 : Ii32PCRel<0xE9, RawFrm, (outs),
@@ -291,4 +291,3 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
   def TAILJMPm64 : I<0xFF, MRM4m, (outs), (ins i64mem_TC:$dst, variable_ops),
                      "jmp{q}\t{*}$dst  # TAILCALL", []>;
 }
-
diff --git a/lib/Target/X86/X86InstrInfo.cpp b/lib/Target/X86/X86InstrInfo.cpp
index 73654d3..63dcd14 100644
--- a/lib/Target/X86/X86InstrInfo.cpp
+++ b/lib/Target/X86/X86InstrInfo.cpp
@@ -58,7 +58,7 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
     TB_NOT_REVERSABLE = 1U << 31,
     TB_FLAGS = TB_NOT_REVERSABLE
   };
-      
+
   static const unsigned OpTbl2Addr[][2] = {
     { X86::ADC32ri,     X86::ADC32mi },
     { X86::ADC32ri8,    X86::ADC32mi8 },
@@ -231,16 +231,16 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
     unsigned MemOp = OpTbl2Addr[i][1] & ~TB_FLAGS;
     assert(!RegOp2MemOpTable2Addr.count(RegOp) && "Duplicated entries?");
     RegOp2MemOpTable2Addr[RegOp] = std::make_pair(MemOp, 0U);
-    
+
     // If this is not a reversable operation (because there is a many->one)
     // mapping, don't insert the reverse of the operation into MemOp2RegOpTable.
     if (OpTbl2Addr[i][1] & TB_NOT_REVERSABLE)
       continue;
-                          
+
     // Index 0, folded load and store, no alignment requirement.
     unsigned AuxInfo = 0 | (1 << 4) | (1 << 5);
-    
-    assert(!MemOp2RegOpTable.count(MemOp) && 
+
+    assert(!MemOp2RegOpTable.count(MemOp) &&
             "Duplicated entries in unfolding maps?");
     MemOp2RegOpTable[MemOp] = std::make_pair(RegOp, AuxInfo);
   }
@@ -334,12 +334,12 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
     unsigned Align      = OpTbl0[i][3];
     assert(!RegOp2MemOpTable0.count(RegOp) && "Duplicated entries?");
     RegOp2MemOpTable0[RegOp] = std::make_pair(MemOp, Align);
-    
+
     // If this is not a reversable operation (because there is a many->one)
     // mapping, don't insert the reverse of the operation into MemOp2RegOpTable.
     if (OpTbl0[i][1] & TB_NOT_REVERSABLE)
       continue;
-    
+
     // Index 0, folded load or store.
     unsigned AuxInfo = 0 | (FoldedLoad << 4) | ((FoldedLoad^1) << 5);
     assert(!MemOp2RegOpTable.count(MemOp) && "Duplicated entries?");
@@ -461,12 +461,12 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
     unsigned Align = OpTbl1[i][2];
     assert(!RegOp2MemOpTable1.count(RegOp) && "Duplicate entries");
     RegOp2MemOpTable1[RegOp] = std::make_pair(MemOp, Align);
-    
+
     // If this is not a reversable operation (because there is a many->one)
     // mapping, don't insert the reverse of the operation into MemOp2RegOpTable.
     if (OpTbl1[i][1] & TB_NOT_REVERSABLE)
       continue;
-    
+
     // Index 1, folded load
     unsigned AuxInfo = 1 | (1 << 4);
     assert(!MemOp2RegOpTable.count(MemOp) && "Duplicate entries");
@@ -678,15 +678,15 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
     unsigned RegOp = OpTbl2[i][0];
     unsigned MemOp = OpTbl2[i][1] & ~TB_FLAGS;
     unsigned Align = OpTbl2[i][2];
-    
+
     assert(!RegOp2MemOpTable2.count(RegOp) && "Duplicate entry!");
     RegOp2MemOpTable2[RegOp] = std::make_pair(MemOp, Align);
-    
+
     // If this is not a reversable operation (because there is a many->one)
     // mapping, don't insert the reverse of the operation into MemOp2RegOpTable.
     if (OpTbl2[i][1] & TB_NOT_REVERSABLE)
       continue;
-    
+
     // Index 2, folded load
     unsigned AuxInfo = 2 | (1 << 4);
     assert(!MemOp2RegOpTable.count(MemOp) &&
@@ -808,7 +808,7 @@ static bool isFrameStoreOpcode(int Opcode) {
   return false;
 }
 
-unsigned X86InstrInfo::isLoadFromStackSlot(const MachineInstr *MI, 
+unsigned X86InstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
                                            int &FrameIndex) const {
   if (isFrameLoadOpcode(MI->getOpcode()))
     if (MI->getOperand(0).getSubReg() == 0 && isFrameOperand(MI, 1, FrameIndex))
@@ -816,7 +816,7 @@ unsigned X86InstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
   return 0;
 }
 
-unsigned X86InstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI, 
+unsigned X86InstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
                                                  int &FrameIndex) const {
   if (isFrameLoadOpcode(MI->getOpcode())) {
     unsigned Reg;
@@ -946,10 +946,10 @@ X86InstrInfo::isReallyTriviallyReMaterializable(const MachineInstr *MI,
           isPICBase = true;
         }
         return isPICBase;
-      } 
+      }
       return false;
     }
- 
+
      case X86::LEA32r:
      case X86::LEA64r: {
        if (MI->getOperand(2).isImm() &&
@@ -1124,9 +1124,9 @@ X86InstrInfo::convertToThreeAddressWithLEA(unsigned MIOpc,
   MachineRegisterInfo &RegInfo = MFI->getParent()->getRegInfo();
   unsigned leaInReg = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);
   unsigned leaOutReg = RegInfo.createVirtualRegister(&X86::GR32RegClass);
-            
+
   // Build and insert into an implicit UNDEF value. This is OK because
-  // well be shifting and then extracting the lower 16-bits. 
+  // well be shifting and then extracting the lower 16-bits.
   // This has the potential to cause partial register stall. e.g.
   //   movw    (%rbp,%rcx,2), %dx
   //   leal    -65(%rdx), %esi
@@ -1162,7 +1162,7 @@ X86InstrInfo::convertToThreeAddressWithLEA(unsigned MIOpc,
   case X86::ADD16ri8:
   case X86::ADD16ri_DB:
   case X86::ADD16ri8_DB:
-    addRegOffset(MIB, leaInReg, true, MI->getOperand(2).getImm());    
+    addRegOffset(MIB, leaInReg, true, MI->getOperand(2).getImm());
     break;
   case X86::ADD16rr:
   case X86::ADD16rr_DB: {
@@ -1177,7 +1177,7 @@ X86InstrInfo::convertToThreeAddressWithLEA(unsigned MIOpc,
     } else {
       leaInReg2 = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);
       // Build and insert into an implicit UNDEF value. This is OK because
-      // well be shifting and then extracting the lower 16-bits. 
+      // well be shifting and then extracting the lower 16-bits.
       BuildMI(*MFI, MIB, MI->getDebugLoc(), get(X86::IMPLICIT_DEF), leaInReg2);
       InsMI2 =
         BuildMI(*MFI, MIB, MI->getDebugLoc(), get(TargetOpcode::COPY))
@@ -1244,7 +1244,7 @@ X86InstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
   case X86::SHUFPSrri: {
     assert(MI->getNumOperands() == 4 && "Unknown shufps instruction!");
     if (!TM.getSubtarget<X86Subtarget>().hasSSE2()) return 0;
-    
+
     unsigned B = MI->getOperand(1).getReg();
     unsigned C = MI->getOperand(2).getReg();
     if (B != C) return 0;
@@ -1392,7 +1392,7 @@ X86InstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
         RC = X86::GR32_NOSPRegisterClass;
       }
 
-      
+
       unsigned Src2 = MI->getOperand(2).getReg();
       bool isKill2 = MI->getOperand(2).isKill();
 
@@ -1471,7 +1471,7 @@ X86InstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
       LV->replaceKillInstruction(Dest, MI, NewMI);
   }
 
-  MFI->insert(MBBI, NewMI);          // Insert the new inst    
+  MFI->insert(MBBI, NewMI);          // Insert the new inst
   return NewMI;
 }
 
@@ -1692,7 +1692,7 @@ X86::CondCode X86::GetOppositeBranchCondition(X86::CondCode CC) {
 bool X86InstrInfo::isUnpredicatedTerminator(const MachineInstr *MI) const {
   const TargetInstrDesc &TID = MI->getDesc();
   if (!TID.isTerminator()) return false;
-  
+
   // Conditional branch is a special case.
   if (TID.isBranch() && !TID.isBarrier())
     return true;
@@ -1701,7 +1701,7 @@ bool X86InstrInfo::isUnpredicatedTerminator(const MachineInstr *MI) const {
   return !isPredicated(MI);
 }
 
-bool X86InstrInfo::AnalyzeBranch(MachineBasicBlock &MBB, 
+bool X86InstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,
                                  MachineBasicBlock *&TBB,
                                  MachineBasicBlock *&FBB,
                                  SmallVectorImpl<MachineOperand> &Cond,
@@ -1862,7 +1862,7 @@ unsigned X86InstrInfo::RemoveBranch(MachineBasicBlock &MBB) const {
     I = MBB.end();
     ++Count;
   }
-  
+
   return Count;
 }
 
@@ -2177,7 +2177,7 @@ static MachineInstr *FuseTwoAddrInst(MachineFunction &MF, unsigned Opcode,
     MIB.addOperand(MOs[i]);
   if (NumAddrOps < 4)  // FrameIndex only
     addOffset(MIB, 0);
-  
+
   // Loop over the rest of the ri operands, converting them over.
   unsigned NumOps = MI->getDesc().getNumOperands()-2;
   for (unsigned i = 0; i != NumOps; ++i) {
@@ -2198,7 +2198,7 @@ static MachineInstr *FuseInst(MachineFunction &MF,
   MachineInstr *NewMI = MF.CreateMachineInstr(TII.get(Opcode),
                                               MI->getDebugLoc(), true);
   MachineInstrBuilder MIB(NewMI);
-  
+
   for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
     MachineOperand &MO = MI->getOperand(i);
     if (i == OpNo) {
@@ -2247,7 +2247,7 @@ X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
   if (isTwoAddr && NumOps >= 2 && i < 2 &&
       MI->getOperand(0).isReg() &&
       MI->getOperand(1).isReg() &&
-      MI->getOperand(0).getReg() == MI->getOperand(1).getReg()) { 
+      MI->getOperand(0).getReg() == MI->getOperand(1).getReg()) {
     OpcodeTablePtr = &RegOp2MemOpTable2Addr;
     isTwoAddrFold = true;
   } else if (i == 0) { // If operand 0
@@ -2261,14 +2261,14 @@ X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
       NewMI = MakeM0Inst(*this, X86::MOV8mi, MOs, MI);
     if (NewMI)
       return NewMI;
-    
+
     OpcodeTablePtr = &RegOp2MemOpTable0;
   } else if (i == 1) {
     OpcodeTablePtr = &RegOp2MemOpTable1;
   } else if (i == 2) {
     OpcodeTablePtr = &RegOp2MemOpTable2;
   }
-  
+
   // If table selected...
   if (OpcodeTablePtr) {
     // Find the Opcode to fuse
@@ -2316,8 +2316,8 @@ X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
       return NewMI;
     }
   }
-  
-  // No fusion 
+
+  // No fusion
   if (PrintFailedFusing && !MI->isCopy())
     dbgs() << "We failed to fuse operand " << i << " in " << *MI;
   return NULL;
@@ -2328,7 +2328,7 @@ MachineInstr* X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
                                                   MachineInstr *MI,
                                            const SmallVectorImpl<unsigned> &Ops,
                                                   int FrameIndex) const {
-  // Check switch flag 
+  // Check switch flag
   if (NoFusing) return NULL;
 
   if (!MF.getFunction()->hasFnAttr(Attribute::OptimizeForSize))
@@ -2380,7 +2380,7 @@ MachineInstr* X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
                                                   MachineInstr *MI,
                                            const SmallVectorImpl<unsigned> &Ops,
                                                   MachineInstr *LoadMI) const {
-  // Check switch flag 
+  // Check switch flag
   if (NoFusing) return NULL;
 
   if (!MF.getFunction()->hasFnAttr(Attribute::OptimizeForSize))
@@ -2523,13 +2523,13 @@ MachineInstr* X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
 
 bool X86InstrInfo::canFoldMemoryOperand(const MachineInstr *MI,
                                   const SmallVectorImpl<unsigned> &Ops) const {
-  // Check switch flag 
+  // Check switch flag
   if (NoFusing) return 0;
 
   if (Ops.size() == 2 && Ops[0] == 0 && Ops[1] == 1) {
     switch (MI->getOpcode()) {
     default: return false;
-    case X86::TEST8rr: 
+    case X86::TEST8rr:
     case X86::TEST16rr:
     case X86::TEST32rr:
     case X86::TEST64rr:
@@ -2550,7 +2550,7 @@ bool X86InstrInfo::canFoldMemoryOperand(const MachineInstr *MI,
   // instruction is different than folding it other places.  It requires
   // replacing the *two* registers with the memory location.
   const DenseMap<unsigned, std::pair<unsigned,unsigned> > *OpcodeTablePtr = 0;
-  if (isTwoAddr && NumOps >= 2 && OpNum < 2) { 
+  if (isTwoAddr && NumOps >= 2 && OpNum < 2) {
     OpcodeTablePtr = &RegOp2MemOpTable2Addr;
   } else if (OpNum == 0) { // If operand 0
     switch (Opc) {
@@ -2566,7 +2566,7 @@ bool X86InstrInfo::canFoldMemoryOperand(const MachineInstr *MI,
   } else if (OpNum == 2) {
     OpcodeTablePtr = &RegOp2MemOpTable2;
   }
-  
+
   if (OpcodeTablePtr && OpcodeTablePtr->count(Opc))
     return true;
   return TargetInstrInfoImpl::canFoldMemoryOperand(MI, Ops);
@@ -2636,7 +2636,7 @@ bool X86InstrInfo::unfoldMemoryOperand(MachineFunction &MF, MachineInstr *MI,
   // Emit the data processing instruction.
   MachineInstr *DataMI = MF.CreateMachineInstr(TID, MI->getDebugLoc(), true);
   MachineInstrBuilder MIB(DataMI);
-  
+
   if (FoldedStore)
     MIB.addReg(Reg, RegState::Define);
   for (unsigned i = 0, e = BeforeOps.size(); i != e; ++i)
@@ -3156,11 +3156,11 @@ namespace {
         PC = RegInfo.createVirtualRegister(X86::GR32RegisterClass);
       else
         PC = GlobalBaseReg;
-  
+
       // Operand of MovePCtoStack is completely ignored by asm printer. It's
       // only used in JIT code emission as displacement to pc.
       BuildMI(FirstMBB, MBBI, DL, TII->get(X86::MOVPC32r), PC).addImm(0);
-  
+
       // If we're using vanilla 'GOT' PIC style, we should use relative addressing
       // not to pc, but to _GLOBAL_OFFSET_TABLE_ external.
       if (TM->getSubtarget<X86Subtarget>().isPICStyleGOT()) {
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index f9c0a7b..4748f13 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -36,7 +36,7 @@ def SDTBinaryArithWithFlags : SDTypeProfile<2, 2,
                                              SDTCisSameAs<0, 3>,
                                              SDTCisInt<0>, SDTCisVT<1, i32>]>;
 
-// SDTBinaryArithWithFlagsInOut - RES1, EFLAGS = op LHS, RHS, EFLAGS 
+// SDTBinaryArithWithFlagsInOut - RES1, EFLAGS = op LHS, RHS, EFLAGS
 def SDTBinaryArithWithFlagsInOut : SDTypeProfile<2, 3,
                                             [SDTCisSameAs<0, 2>,
                                              SDTCisSameAs<0, 3>,
@@ -1612,4 +1612,3 @@ def : InstAlias<"xchgb $mem, $val", (XCHG8rm  GR8 :$val, i8mem :$mem)>;
 def : InstAlias<"xchgw $mem, $val", (XCHG16rm GR16:$val, i16mem:$mem)>;
 def : InstAlias<"xchgl $mem, $val", (XCHG32rm GR32:$val, i32mem:$mem)>;
 def : InstAlias<"xchgq $mem, $val", (XCHG64rm GR64:$val, i64mem:$mem)>;
-
diff --git a/lib/Target/X86/X86MCInstLower.cpp b/lib/Target/X86/X86MCInstLower.cpp
index cbe6db2..4159af1 100644
--- a/lib/Target/X86/X86MCInstLower.cpp
+++ b/lib/Target/X86/X86MCInstLower.cpp
@@ -46,12 +46,12 @@ GetSymbolFromOperand(const MachineOperand &MO) const {
   assert((MO.isGlobal() || MO.isSymbol()) && "Isn't a symbol reference");
 
   SmallString<128> Name;
-  
+
   if (!MO.isGlobal()) {
     assert(MO.isSymbol());
     Name += MAI.getGlobalPrefix();
     Name += MO.getSymbolName();
-  } else {    
+  } else {
     const GlobalValue *GV = MO.getGlobal();
     bool isImplicitlyPrivate = false;
     if (MO.getTargetFlags() == X86II::MO_DARWIN_STUB ||
@@ -59,7 +59,7 @@ GetSymbolFromOperand(const MachineOperand &MO) const {
         MO.getTargetFlags() == X86II::MO_DARWIN_NONLAZY_PIC_BASE ||
         MO.getTargetFlags() == X86II::MO_DARWIN_HIDDEN_NONLAZY_PIC_BASE)
       isImplicitlyPrivate = true;
-    
+
     Mang->getNameWithPrefix(Name, GV, isImplicitlyPrivate);
   }
 
@@ -110,7 +110,7 @@ GetSymbolFromOperand(const MachineOperand &MO) const {
       getMachOMMI().getFnStubEntry(Sym);
     if (StubSym.getPointer())
       return Sym;
-    
+
     if (MO.isGlobal()) {
       StubSym =
         MachineModuleInfoImpl::
@@ -135,7 +135,7 @@ MCOperand X86MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
   // lot of extra uniquing.
   const MCExpr *Expr = 0;
   MCSymbolRefExpr::VariantKind RefKind = MCSymbolRefExpr::VK_None;
-  
+
   switch (MO.getTargetFlags()) {
   default: llvm_unreachable("Unknown target flag on GV operand");
   case X86II::MO_NO_FLAG:    // No flag.
@@ -144,7 +144,7 @@ MCOperand X86MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
   case X86II::MO_DLLIMPORT:
   case X86II::MO_DARWIN_STUB:
     break;
-      
+
   case X86II::MO_TLVP:      RefKind = MCSymbolRefExpr::VK_TLVP; break;
   case X86II::MO_TLVP_PIC_BASE:
     Expr = MCSymbolRefExpr::Create(Sym, MCSymbolRefExpr::VK_TLVP, Ctx);
@@ -168,7 +168,7 @@ MCOperand X86MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
   case X86II::MO_DARWIN_HIDDEN_NONLAZY_PIC_BASE:
     Expr = MCSymbolRefExpr::Create(Sym, Ctx);
     // Subtract the pic base.
-    Expr = MCBinaryExpr::CreateSub(Expr, 
+    Expr = MCBinaryExpr::CreateSub(Expr,
                             MCSymbolRefExpr::Create(MF.getPICBaseSymbol(), Ctx),
                                    Ctx);
     if (MO.isJTI() && MAI.hasSetDirective()) {
@@ -182,10 +182,10 @@ MCOperand X86MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
     }
     break;
   }
-  
+
   if (Expr == 0)
     Expr = MCSymbolRefExpr::Create(Sym, RefKind, Ctx);
-  
+
   if (!MO.isJTI() && MO.getOffset())
     Expr = MCBinaryExpr::CreateAdd(Expr,
                                    MCConstantExpr::Create(MO.getOffset(), Ctx),
@@ -206,10 +206,10 @@ static void lower_lea64_32mem(MCInst *MI, unsigned OpNo) {
   // Convert registers in the addr mode according to subreg64.
   for (unsigned i = 0; i != 4; ++i) {
     if (!MI->getOperand(OpNo+i).isReg()) continue;
-    
+
     unsigned Reg = MI->getOperand(OpNo+i).getReg();
     if (Reg == 0) continue;
-    
+
     MI->getOperand(OpNo+i).setReg(getX86SubSuperRegister(Reg, MVT::i64));
   }
 }
@@ -274,7 +274,7 @@ static void SimplifyShortMoveForm(X86AsmPrinter &Printer, MCInst &Inst,
     return;
 
   // Check whether this is an absolute address.
-  // FIXME: We know TLVP symbol refs aren't, but there should be a better way 
+  // FIXME: We know TLVP symbol refs aren't, but there should be a better way
   // to do this here.
   bool Absolute = true;
   if (Inst.getOperand(AddrOp).isExpr()) {
@@ -283,7 +283,7 @@ static void SimplifyShortMoveForm(X86AsmPrinter &Printer, MCInst &Inst,
       if (SRE->getKind() == MCSymbolRefExpr::VK_TLVP)
         Absolute = false;
   }
-  
+
   if (Absolute &&
       (Inst.getOperand(AddrBase + 0).getReg() != 0 ||
        Inst.getOperand(AddrBase + 2).getReg() != 0 ||
@@ -300,10 +300,10 @@ static void SimplifyShortMoveForm(X86AsmPrinter &Printer, MCInst &Inst,
 
 void X86MCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
   OutMI.setOpcode(MI->getOpcode());
-  
+
   for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
     const MachineOperand &MO = MI->getOperand(i);
-    
+
     MCOperand MCOp;
     switch (MO.getType()) {
     default:
@@ -336,10 +336,10 @@ void X86MCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
                      AsmPrinter.GetBlockAddressSymbol(MO.getBlockAddress()));
       break;
     }
-    
+
     OutMI.addOperand(MCOp);
   }
-  
+
   // Handle a few special cases to eliminate operand modifiers.
 ReSimplify:
   switch (OutMI.getOpcode()) {
@@ -429,7 +429,7 @@ ReSimplify:
     case X86::TAILJMPd:
     case X86::TAILJMPd64: Opcode = X86::JMP_1; break;
     }
-    
+
     MCOperand Saved = OutMI.getOperand(0);
     OutMI = MCInst();
     OutMI.setOpcode(Opcode);
@@ -449,7 +449,7 @@ ReSimplify:
   case X86::ADD16ri8_DB:  OutMI.setOpcode(X86::OR16ri8); goto ReSimplify;
   case X86::ADD32ri8_DB:  OutMI.setOpcode(X86::OR32ri8); goto ReSimplify;
   case X86::ADD64ri8_DB:  OutMI.setOpcode(X86::OR64ri8); goto ReSimplify;
-      
+
   // The assembler backend wants to see branches in their small form and relax
   // them to their large form.  The JIT can only handle the large form because
   // it does not do relaxation.  For now, translate the large form to the
@@ -605,7 +605,7 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
     if (OutStreamer.hasRawTextSupport())
       OutStreamer.EmitRawText(StringRef("\t#MEMBARRIER"));
     return;
-        
+
 
   case X86::EH_RETURN:
   case X86::EH_RETURN64: {
@@ -633,7 +633,7 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
     //     call "L1$pb"
     // "L1$pb":
     //     popl %esi
-    
+
     // Emit the call.
     MCSymbol *PICBase = MF->getPICBaseSymbol();
     TmpInst.setOpcode(X86::CALLpcrel32);
@@ -642,43 +642,43 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
     TmpInst.addOperand(MCOperand::CreateExpr(MCSymbolRefExpr::Create(PICBase,
                                                                  OutContext)));
     OutStreamer.EmitInstruction(TmpInst);
-    
+
     // Emit the label.
     OutStreamer.EmitLabel(PICBase);
-    
+
     // popl $reg
     TmpInst.setOpcode(X86::POP32r);
     TmpInst.getOperand(0) = MCOperand::CreateReg(MI->getOperand(0).getReg());
     OutStreamer.EmitInstruction(TmpInst);
     return;
   }
-      
+
   case X86::ADD32ri: {
     // Lower the MO_GOT_ABSOLUTE_ADDRESS form of ADD32ri.
     if (MI->getOperand(2).getTargetFlags() != X86II::MO_GOT_ABSOLUTE_ADDRESS)
       break;
-    
+
     // Okay, we have something like:
     //  EAX = ADD32ri EAX, MO_GOT_ABSOLUTE_ADDRESS(@MYGLOBAL)
-    
+
     // For this, we want to print something like:
     //   MYGLOBAL + (. - PICBASE)
     // However, we can't generate a ".", so just emit a new label here and refer
     // to it.
     MCSymbol *DotSym = OutContext.CreateTempSymbol();
     OutStreamer.EmitLabel(DotSym);
-    
+
     // Now that we have emitted the label, lower the complex operand expression.
     MCSymbol *OpSym = MCInstLowering.GetSymbolFromOperand(MI->getOperand(2));
-    
+
     const MCExpr *DotExpr = MCSymbolRefExpr::Create(DotSym, OutContext);
     const MCExpr *PICBase =
       MCSymbolRefExpr::Create(MF->getPICBaseSymbol(), OutContext);
     DotExpr = MCBinaryExpr::CreateSub(DotExpr, PICBase, OutContext);
-    
-    DotExpr = MCBinaryExpr::CreateAdd(MCSymbolRefExpr::Create(OpSym,OutContext), 
+
+    DotExpr = MCBinaryExpr::CreateAdd(MCSymbolRefExpr::Create(OpSym,OutContext),
                                       DotExpr, OutContext);
-    
+
     MCInst TmpInst;
     TmpInst.setOpcode(X86::ADD32ri);
     TmpInst.addOperand(MCOperand::CreateReg(MI->getOperand(0).getReg()));
@@ -688,9 +688,8 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
     return;
   }
   }
-  
+
   MCInst TmpInst;
   MCInstLowering.Lower(MI, TmpInst);
   OutStreamer.EmitInstruction(TmpInst);
 }
-
diff --git a/lib/Target/X86/X86RegisterInfo.cpp b/lib/Target/X86/X86RegisterInfo.cpp
index 1faf6d9..06c671b 100644
--- a/lib/Target/X86/X86RegisterInfo.cpp
+++ b/lib/Target/X86/X86RegisterInfo.cpp
@@ -445,11 +445,11 @@ bool X86RegisterInfo::needsStackRealignment(const MachineFunction &MF) const {
   if (0 && requiresRealignment && MFI->hasVarSizedObjects())
     report_fatal_error(
       "Stack realignment in presense of dynamic allocas is not supported");
-    
+
   // If we've requested that we force align the stack do so now.
   if (ForceStackAlign)
     return canRealignStack(MF);
-    
+
   return requiresRealignment && canRealignStack(MF);
 }
 
@@ -524,7 +524,7 @@ eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
 
       // Factor out the amount the callee already popped.
       Amount -= CalleeAmt;
-  
+
       if (Amount) {
         unsigned Opc = getADDriOpcode(Is64Bit, Amount);
         New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
diff --git a/lib/Target/X86/X86RegisterInfo.td b/lib/Target/X86/X86RegisterInfo.td
index dc4c042..45bb989 100644
--- a/lib/Target/X86/X86RegisterInfo.td
+++ b/lib/Target/X86/X86RegisterInfo.td
@@ -1,10 +1,10 @@
 //===- X86RegisterInfo.td - Describe the X86 Register File --*- tablegen -*-==//
-// 
+//
 //                     The LLVM Compiler Infrastructure
 //
 // This file is distributed under the University of Illinois Open Source
 // License. See LICENSE.TXT for details.
-// 
+//
 //===----------------------------------------------------------------------===//
 //
 // This file describes the X86 Register file, defining the registers themselves,
@@ -34,8 +34,8 @@ let Namespace = "X86" in {
   // because the register file generator is smart enough to figure out that
   // AL aliases AX if we tell it that AX aliased AL (for example).
 
-  // Dwarf numbering is different for 32-bit and 64-bit, and there are 
-  // variations by target as well. Currently the first entry is for X86-64, 
+  // Dwarf numbering is different for 32-bit and 64-bit, and there are
+  // variations by target as well. Currently the first entry is for X86-64,
   // second - for EH on X86-32/Darwin and third is 'generic' one (X86-32/Linux
   // and debug information on X86-32/Darwin)
 
@@ -81,7 +81,7 @@ let Namespace = "X86" in {
   def SP : RegisterWithSubRegs<"sp", [SPL]>, DwarfRegNum<[7, 5, 4]>;
   }
   def IP : Register<"ip">, DwarfRegNum<[16]>;
-  
+
   // X86-64 only
   let SubRegIndices = [sub_8bit] in {
   def R8W  : RegisterWithSubRegs<"r8w", [R8B]>, DwarfRegNum<[8, -2, -2]>;
@@ -103,8 +103,8 @@ let Namespace = "X86" in {
   def EDI : RegisterWithSubRegs<"edi", [DI]>, DwarfRegNum<[5, 7, 7]>;
   def EBP : RegisterWithSubRegs<"ebp", [BP]>, DwarfRegNum<[6, 4, 5]>;
   def ESP : RegisterWithSubRegs<"esp", [SP]>, DwarfRegNum<[7, 5, 4]>;
-  def EIP : RegisterWithSubRegs<"eip", [IP]>, DwarfRegNum<[16, 8, 8]>;  
-  
+  def EIP : RegisterWithSubRegs<"eip", [IP]>, DwarfRegNum<[16, 8, 8]>;
+
   // X86-64 only
   def R8D  : RegisterWithSubRegs<"r8d", [R8W]>, DwarfRegNum<[8, -2, -2]>;
   def R9D  : RegisterWithSubRegs<"r9d", [R9W]>, DwarfRegNum<[9, -2, -2]>;
@@ -208,7 +208,7 @@ let Namespace = "X86" in {
   def ST4 : Register<"st(4)">, DwarfRegNum<[37, 16, 15]>;
   def ST5 : Register<"st(5)">, DwarfRegNum<[38, 17, 16]>;
   def ST6 : Register<"st(6)">, DwarfRegNum<[39, 18, 17]>;
-  def ST7 : Register<"st(7)">, DwarfRegNum<[40, 19, 18]>; 
+  def ST7 : Register<"st(7)">, DwarfRegNum<[40, 19, 18]>;
 
   // Status flags register
   def EFLAGS : Register<"flags">;
@@ -220,7 +220,7 @@ let Namespace = "X86" in {
   def ES : Register<"es">;
   def FS : Register<"fs">;
   def GS : Register<"gs">;
-  
+
   // Debug registers
   def DR0 : Register<"dr0">;
   def DR1 : Register<"dr1">;
@@ -230,7 +230,7 @@ let Namespace = "X86" in {
   def DR5 : Register<"dr5">;
   def DR6 : Register<"dr6">;
   def DR7 : Register<"dr7">;
-  
+
   // Control registers
   def CR0 : Register<"cr0">;
   def CR1 : Register<"cr1">;
@@ -261,10 +261,10 @@ let Namespace = "X86" in {
 // implicitly defined to be the register allocation order.
 //
 
-// List call-clobbered registers before callee-save registers. RBX, RBP, (and 
+// List call-clobbered registers before callee-save registers. RBX, RBP, (and
 // R12, R13, R14, and R15 for X86-64) are callee-save registers.
 // In 64-mode, there are 12 additional i8 registers, SIL, DIL, BPL, SPL, and
-// R8B, ... R15B. 
+// R8B, ... R15B.
 // Allocate R12 and R13 last, as these require an extra byte when
 // encoded in x86_64 instructions.
 // FIXME: Allow AH, CH, DH, BH to be used as general-purpose registers in
-- 
1.7.1.GIT

-------------- next part --------------
From ed62d827ef015cb8abdb58ae4606ae6b03bff481 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Fri, 7 Jan 2011 10:58:30 +0900
Subject: [PATCH 2/9] test/CodeGen/X86: Fix whitespace.

---
 test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll |    5 ++---
 test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll       |    5 ++---
 test/CodeGen/X86/tailcallstack64.ll                |    1 -
 test/CodeGen/X86/win64_vararg.ll                   |   10 +++++-----
 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll b/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
index c598228..c5d3ac1 100644
--- a/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
+++ b/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
@@ -3,7 +3,6 @@ target triple = "x86_64-pc-mingw64"
 
 define x86_fp80 @a(i64 %x) nounwind readnone {
 entry:
-	%conv = sitofp i64 %x to x86_fp80		; <x86_fp80> [#uses=1]
-	ret x86_fp80 %conv
+        %conv = sitofp i64 %x to x86_fp80               ; <x86_fp80> [#uses=1]
+        ret x86_fp80 %conv
 }
-
diff --git a/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll b/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
index 810a6f4..b722589 100644
--- a/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
+++ b/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
@@ -6,7 +6,6 @@ target triple = "x86_64-pc-mingw64"
 
 define i32 @a() nounwind {
 entry:
-	tail call void asm sideeffect "", "~{xmm7},~{xmm8},~{dirflag},~{fpsr},~{flags}"() nounwind
-	ret i32 undef
+        tail call void asm sideeffect "", "~{xmm7},~{xmm8},~{dirflag},~{fpsr},~{flags}"() nounwind
+        ret i32 undef
 }
-
diff --git a/test/CodeGen/X86/tailcallstack64.ll b/test/CodeGen/X86/tailcallstack64.ll
index 107bdf9..52b074d 100644
--- a/test/CodeGen/X86/tailcallstack64.ll
+++ b/test/CodeGen/X86/tailcallstack64.ll
@@ -22,4 +22,3 @@ entry:
         %retval = tail call fastcc i32 @tailcallee(i32 %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5, i32 %p6, i32 %in2,i32 %tmp)
         ret i32 %retval
 }
-
diff --git a/test/CodeGen/X86/win64_vararg.ll b/test/CodeGen/X86/win64_vararg.ll
index 072f36a..71b2fa1 100644
--- a/test/CodeGen/X86/win64_vararg.ll
+++ b/test/CodeGen/X86/win64_vararg.ll
@@ -5,11 +5,11 @@
 ; calculated.
 define void @average_va(i32 %count, ...) nounwind {
 entry:
-; CHECK: subq	$40, %rsp
-; CHECK: movq	%r9, 72(%rsp)
-; CHECK: movq	%r8, 64(%rsp)
-; CHECK: movq	%rdx, 56(%rsp)
-; CHECK: leaq	56(%rsp), %rax
+; CHECK: subq   $40, %rsp
+; CHECK: movq   %r9, 72(%rsp)
+; CHECK: movq   %r8, 64(%rsp)
+; CHECK: movq   %rdx, 56(%rsp)
+; CHECK: leaq   56(%rsp), %rax
 
   %ap = alloca i8*, align 8                       ; <i8**> [#uses=1]
   %ap1 = bitcast i8** %ap to i8*                  ; <i8*> [#uses=1]
-- 
1.7.1.GIT

-------------- next part --------------
From 1bca7c8f8b750651827b9e045c93ff66afed4bdf Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Mon, 10 Jan 2011 13:19:04 +0900
Subject: [PATCH 3/9] lib/Target/X86/X86ISelLowering.cpp: Introduce a new variable "IsWin64". No functional changes.

---
 lib/Target/X86/X86ISelLowering.cpp |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index 2268823..3f1bed1 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -1863,6 +1863,7 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
                              SmallVectorImpl<SDValue> &InVals) const {
   MachineFunction &MF = DAG.getMachineFunction();
   bool Is64Bit        = Subtarget->is64Bit();
+  bool IsWin64        = Subtarget->isTargetWin64();
   bool IsStructRet    = CallIsStructReturn(Outs);
   bool IsSibcall      = false;
 
@@ -1970,7 +1971,7 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
 
     if (VA.isRegLoc()) {
       RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));
-      if (isVarArg && Subtarget->isTargetWin64()) {
+      if (isVarArg && IsWin64) {
         // Win64 ABI requires argument XMM reg to be copied to the corresponding
         // shadow reg if callee is a varargs function.
         unsigned ShadowReg = 0;
@@ -2036,7 +2037,7 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
     }
   }
 
-  if (Is64Bit && isVarArg && !Subtarget->isTargetWin64()) {
+  if (Is64Bit && isVarArg && !IsWin64) {
     // From AMD64 ABI document:
     // For calls that may call functions that use varargs or stdargs
     // (prototype-less calls or calls to functions containing ellipsis (...) in
@@ -2211,7 +2212,7 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
     Ops.push_back(DAG.getRegister(X86::EBX, getPointerTy()));
 
   // Add an implicit use of AL for non-Windows x86 64-bit vararg functions.
-  if (Is64Bit && isVarArg && !Subtarget->isTargetWin64())
+  if (Is64Bit && isVarArg && !IsWin64)
     Ops.push_back(DAG.getRegister(X86::AL, MVT::i8));
 
   if (InFlag.getNode())
-- 
1.7.1.GIT

-------------- next part --------------
From 674c32b4785d1bbc146b032735bdc021475cfef0 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Mon, 13 Dec 2010 17:59:20 +0900
Subject: [PATCH 4/9] Target/X86: Tweak allocating shadow area (aka home) on Win64. It must be enough for caller to allocate one.

---
 lib/Target/X86/X86FrameLowering.cpp                |    5 ----
 lib/Target/X86/X86FrameLowering.h                  |    3 +-
 lib/Target/X86/X86ISelLowering.cpp                 |   21 ++++++++++++++++++-
 test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll |    4 +-
 test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll       |    9 +++----
 test/CodeGen/X86/win64_params.ll                   |    4 +-
 test/CodeGen/X86/win64_vararg.ll                   |   10 ++++----
 7 files changed, 33 insertions(+), 23 deletions(-)

diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index cbf1b59..71fd8d1 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -400,11 +400,6 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF) const {
     if (HasFP) MinSize += SlotSize;
     StackSize = std::max(MinSize, StackSize > 128 ? StackSize - 128 : 0);
     MFI->setStackSize(StackSize);
-  } else if (IsWin64) {
-    // We need to always allocate 32 bytes as register spill area.
-    // FIXME: We might reuse these 32 bytes for leaf functions.
-    StackSize += 32;
-    MFI->setStackSize(StackSize);
   }
 
   // Insert stack pointer adjustment for later moving of return addr.  Only
diff --git a/lib/Target/X86/X86FrameLowering.h b/lib/Target/X86/X86FrameLowering.h
index c067e64..d71108c 100644
--- a/lib/Target/X86/X86FrameLowering.h
+++ b/lib/Target/X86/X86FrameLowering.h
@@ -28,8 +28,7 @@ public:
   explicit X86FrameLowering(const X86TargetMachine &tm, const X86Subtarget &sti)
     : TargetFrameLowering(StackGrowsDown,
                           sti.getStackAlignment(),
-                          (sti.isTargetWin64() ? -40 :
-                           (sti.is64Bit() ? -8 : -4))),
+                          (sti.is64Bit() ? -8 : -4)),
       TM(tm), STI(sti) {
   }
 
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index 3f1bed1..d3213de 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -1569,6 +1569,12 @@ X86TargetLowering::LowerFormalArguments(SDValue Chain,
   SmallVector<CCValAssign, 16> ArgLocs;
   CCState CCInfo(CallConv, isVarArg, getTargetMachine(),
                  ArgLocs, *DAG.getContext());
+
+  // Allocate shadow area for Win64
+  if (IsWin64) {
+    CCInfo.AllocateStack(32, 8);
+  }
+
   CCInfo.AnalyzeFormalArguments(Ins, CC_X86);
 
   unsigned LastVal = ~0U;
@@ -1803,8 +1809,7 @@ X86TargetLowering::LowerMemOpCallTo(SDValue Chain,
                                     DebugLoc dl, SelectionDAG &DAG,
                                     const CCValAssign &VA,
                                     ISD::ArgFlagsTy Flags) const {
-  const unsigned FirstStackArgOffset = (Subtarget->isTargetWin64() ? 32 : 0);
-  unsigned LocMemOffset = FirstStackArgOffset + VA.getLocMemOffset();
+  unsigned LocMemOffset = VA.getLocMemOffset();
   SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset);
   PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(), StackPtr, PtrOff);
   if (Flags.isByVal())
@@ -1889,6 +1894,12 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
   SmallVector<CCValAssign, 16> ArgLocs;
   CCState CCInfo(CallConv, isVarArg, getTargetMachine(),
                  ArgLocs, *DAG.getContext());
+
+  // Allocate shadow area for Win64
+  if (IsWin64) {
+    CCInfo.AllocateStack(32, 8);
+  }
+
   CCInfo.AnalyzeCallOperands(Outs, CC_X86);
 
   // Get a count of how many bytes are to be pushed on the stack.
@@ -2472,6 +2483,12 @@ X86TargetLowering::IsEligibleForTailCallOptimization(SDValue Callee,
     SmallVector<CCValAssign, 16> ArgLocs;
     CCState CCInfo(CalleeCC, isVarArg, getTargetMachine(),
                    ArgLocs, *DAG.getContext());
+
+    // Allocate shadow area for Win64
+    if (Subtarget->isTargetWin64()) {
+      CCInfo.AllocateStack(32, 8);
+    }
+
     CCInfo.AnalyzeCallOperands(Outs, CC_X86);
     if (CCInfo.getNextStackOffset()) {
       MachineFunction &MF = DAG.getMachineFunction();
diff --git a/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll b/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
index c5d3ac1..9d06a9e 100644
--- a/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
+++ b/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
@@ -1,5 +1,5 @@
-; RUN: llc < %s | grep "subq.*\\\$40, \\\%rsp"
-target triple = "x86_64-pc-mingw64"
+; RUN: llc -mtriple=x86_64-pc-mingw64 < %s | FileCheck %s
+; CHECK-NOT: -{{[1-9][0-9]*}}(%rsp)
 
 define x86_fp80 @a(i64 %x) nounwind readnone {
 entry:
diff --git a/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll b/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
index b722589..6e8d9a9 100644
--- a/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
+++ b/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
@@ -1,8 +1,7 @@
-; RUN: llc < %s -o %t1
-; RUN: grep "subq.*\\\$72, \\\%rsp" %t1
-; RUN: grep "movaps	\\\%xmm8, 32\\\(\\\%rsp\\\)" %t1
-; RUN: grep "movaps	\\\%xmm7, 48\\\(\\\%rsp\\\)" %t1
-target triple = "x86_64-pc-mingw64"
+; RUN: llc -mtriple=x86_64-pc-mingw64 < %s | FileCheck %s
+; CHECK: subq    $40, %rsp
+; CHECK: movaps  %xmm8, (%rsp)
+; CHECK: movaps  %xmm7, 16(%rsp)
 
 define i32 @a() nounwind {
 entry:
diff --git a/test/CodeGen/X86/win64_params.ll b/test/CodeGen/X86/win64_params.ll
index 0b67368..f9d4bf9 100644
--- a/test/CodeGen/X86/win64_params.ll
+++ b/test/CodeGen/X86/win64_params.ll
@@ -4,8 +4,8 @@
 ; on the stack.
 define i32 @f6(i32 %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5, i32 %p6) nounwind readnone optsize {
 entry:
-; CHECK: movl    80(%rsp), %eax
-; CHECK: addl    72(%rsp), %eax
+; CHECK: movl    48(%rsp), %eax
+; CHECK: addl    40(%rsp), %eax
   %add = add nsw i32 %p6, %p5
   ret i32 %add
 }
diff --git a/test/CodeGen/X86/win64_vararg.ll b/test/CodeGen/X86/win64_vararg.ll
index 71b2fa1..a451318 100644
--- a/test/CodeGen/X86/win64_vararg.ll
+++ b/test/CodeGen/X86/win64_vararg.ll
@@ -5,11 +5,11 @@
 ; calculated.
 define void @average_va(i32 %count, ...) nounwind {
 entry:
-; CHECK: subq   $40, %rsp
-; CHECK: movq   %r9, 72(%rsp)
-; CHECK: movq   %r8, 64(%rsp)
-; CHECK: movq   %rdx, 56(%rsp)
-; CHECK: leaq   56(%rsp), %rax
+; CHECK: pushq
+; CHECK: movq   %r9, 40(%rsp)
+; CHECK: movq   %r8, 32(%rsp)
+; CHECK: movq   %rdx, 24(%rsp)
+; CHECK: leaq   24(%rsp), %rax
 
   %ap = alloca i8*, align 8                       ; <i8**> [#uses=1]
   %ap1 = bitcast i8** %ap to i8*                  ; <i8*> [#uses=1]
-- 
1.7.1.GIT

-------------- next part --------------
From 6d3c586594c36faa5244c982c7068dac18931a63 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Mon, 13 Dec 2010 18:11:31 +0900
Subject: [PATCH 5/9] Target/X86: Tweak alloca and add a testcase for mingw64 and msvcrt on Win64. [PR8778]

---
 lib/Target/X86/X86FrameLowering.cpp        |   19 +++++++--
 lib/Target/X86/X86ISelLowering.cpp         |   57 +++++++++++++++++++--------
 lib/Target/X86/X86InstrControl.td          |   10 +++++
 test/CodeGen/X86/win64_alloca_dynalloca.ll |   56 +++++++++++++++++++++++++++
 test/CodeGen/X86/win_chkstk.ll             |    2 +-
 5 files changed, 121 insertions(+), 23 deletions(-)
 create mode 100644 test/CodeGen/X86/win64_alloca_dynalloca.ll

diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index 71fd8d1..ce88169 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -555,14 +555,23 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF) const {
   // responsible for adjusting the stack pointer.  Touching the stack at 4K
   // increments is necessary to ensure that the guard pages used by the OS
   // virtual memory manager are allocated in correct sequence.
-  if (NumBytes >= 4096 && (STI.isTargetCygMing() || STI.isTargetWin32())) {
+  if (NumBytes >= 4096 && Is64Bit && STI.isTargetCygMing()) {
+    // Sanity check that EAX is not livein for this function.  It should
+    // should not be, so throw an assert.
+    assert(!isEAXLiveIn(MF) && "EAX is livein in the Cygming64 case!");
+
+    BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::RAX)
+      .addImm(NumBytes);
+    BuildMI(MBB, MBBI, DL, TII.get(X86::W64ALLOCA))
+      .addExternalSymbol("___chkstk")
+      .addReg(StackPtr, RegState::Define | RegState::Implicit);
+    // Cygming's ___chkstk adjusts %rsp.
+  } else if (NumBytes >= 4096 && (STI.isTargetCygMing() || STI.isTargetWin32())) {
     // Check whether EAX is livein for this function.
     bool isEAXAlive = isEAXLiveIn(MF);
 
     const char *StackProbeSymbol =
       STI.isTargetWindows() ? "_chkstk" : "_alloca";
-    if (Is64Bit && STI.isTargetCygMing())
-      StackProbeSymbol = "__chkstk";
     unsigned CallOp = Is64Bit ? X86::CALL64pcrel32 : X86::CALLpcrel32;
     if (!isEAXAlive) {
       BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), X86::EAX)
@@ -598,9 +607,9 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF) const {
 
     // Handle the 64-bit Windows ABI case where we need to call __chkstk.
     // Function prologue is responsible for adjusting the stack pointer.
-    BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), X86::EAX)
+    BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::RAX)
       .addImm(NumBytes);
-    BuildMI(MBB, MBBI, DL, TII.get(X86::WINCALL64pcrel32))
+    BuildMI(MBB, MBBI, DL, TII.get(X86::W64ALLOCA))
       .addExternalSymbol("__chkstk")
       .addReg(StackPtr, RegState::Define | RegState::Implicit);
     emitSPUpdate(MBB, MBBI, StackPtr, -(int64_t)NumBytes, Is64Bit,
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index d3213de..a012ae2 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -418,12 +418,9 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
 
   setOperationAction(ISD::STACKSAVE,          MVT::Other, Expand);
   setOperationAction(ISD::STACKRESTORE,       MVT::Other, Expand);
-  if (Subtarget->is64Bit())
-    setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i64, Expand);
-  if (Subtarget->isTargetCygMing() || Subtarget->isTargetWindows())
-    setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i32, Custom);
-  else
-    setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i32, Expand);
+  setOperationAction(ISD::DYNAMIC_STACKALLOC,
+                     (Subtarget->is64Bit() ? MVT::i64 : MVT::i32),
+                     (Subtarget->isTargetCOFF() ? Custom : Expand));
 
   if (!UseSoftFloat && X86ScalarSSEf64) {
     // f32 and f64 use SSE.
@@ -7553,8 +7550,9 @@ X86TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,
   SDValue Flag;
 
   EVT SPTy = Subtarget->is64Bit() ? MVT::i64 : MVT::i32;
+  unsigned Reg = (Subtarget->is64Bit() ? X86::RAX : X86::EAX);
 
-  Chain = DAG.getCopyToReg(Chain, dl, X86::EAX, Size, Flag);
+  Chain = DAG.getCopyToReg(Chain, dl, Reg, Size, Flag);
   Flag = Chain.getValue(1);
 
   SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
@@ -10022,19 +10020,44 @@ X86TargetLowering::EmitLoweredWinAlloca(MachineInstr *MI,
 
   // The lowering is pretty easy: we're just emitting the call to _alloca.  The
   // non-trivial part is impdef of ESP.
-  // FIXME: The code should be tweaked as soon as we'll try to do codegen for
-  // mingw-w64.
 
-  const char *StackProbeSymbol =
+  if (Subtarget->isTargetWin64()) {
+    if (Subtarget->isTargetCygMing()) {
+      // ___chkstk(Mingw64):
+      // Clobbers R10, R11, RAX and EFLAGS.
+      // Updates RSP.
+      BuildMI(*BB, MI, DL, TII->get(X86::W64ALLOCA))
+        .addExternalSymbol("___chkstk")
+        .addReg(X86::RAX, RegState::Implicit)
+        .addReg(X86::RSP, RegState::Implicit)
+        .addReg(X86::RAX, RegState::Define | RegState::Implicit)
+        .addReg(X86::RSP, RegState::Define | RegState::Implicit)
+        .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
+    } else {
+      // __chkstk(MSVCRT): does not update stack pointer.
+      // Clobbers R10, R11 and EFLAGS.
+      // FIXME: RAX(allocated size) might be reused and not killed.
+      BuildMI(*BB, MI, DL, TII->get(X86::W64ALLOCA))
+        .addExternalSymbol("__chkstk")
+        .addReg(X86::RAX, RegState::Implicit)
+        .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
+      // RAX has the offset to subtracted from RSP.
+      BuildMI(*BB, MI, DL, TII->get(X86::SUB64rr), X86::RSP)
+        .addReg(X86::RSP)
+        .addReg(X86::RAX);
+    }
+  } else {
+    const char *StackProbeSymbol =
       Subtarget->isTargetWindows() ? "_chkstk" : "_alloca";
 
-  BuildMI(*BB, MI, DL, TII->get(X86::CALLpcrel32))
-    .addExternalSymbol(StackProbeSymbol)
-    .addReg(X86::EAX, RegState::Implicit)
-    .addReg(X86::ESP, RegState::Implicit)
-    .addReg(X86::EAX, RegState::Define | RegState::Implicit)
-    .addReg(X86::ESP, RegState::Define | RegState::Implicit)
-    .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
+    BuildMI(*BB, MI, DL, TII->get(X86::CALLpcrel32))
+      .addExternalSymbol(StackProbeSymbol)
+      .addReg(X86::EAX, RegState::Implicit)
+      .addReg(X86::ESP, RegState::Implicit)
+      .addReg(X86::EAX, RegState::Define | RegState::Implicit)
+      .addReg(X86::ESP, RegState::Define | RegState::Implicit)
+      .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
+  }
 
   MI->eraseFromParent();   // The pseudo instruction is gone now.
   return BB;
diff --git a/lib/Target/X86/X86InstrControl.td b/lib/Target/X86/X86InstrControl.td
index 4d1c5f7..31f2832 100644
--- a/lib/Target/X86/X86InstrControl.td
+++ b/lib/Target/X86/X86InstrControl.td
@@ -263,6 +263,16 @@ let isCall = 1, isCodeGenOnly = 1 in
                            Requires<[IsWin64]>;
   }
 
+let isCall = 1, isCodeGenOnly = 1 in
+  // __chkstk(MSVC):     clobber R10, R11 and EFLAGS.
+  // ___chkstk(Mingw64): clobber R10, R11, RAX and EFLAGS, and update RSP.
+  let Defs = [RAX, R10, R11, RSP, EFLAGS],
+      Uses = [RSP] in {
+    def W64ALLOCA : Ii32PCRel<0xE8, RawFrm,
+                      (outs), (ins i64i32imm_pcrel:$dst, variable_ops),
+                      "call{q}\t$dst", []>,
+                    Requires<[IsWin64]>;
+  }
 
 let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
     isCodeGenOnly = 1 in
diff --git a/test/CodeGen/X86/win64_alloca_dynalloca.ll b/test/CodeGen/X86/win64_alloca_dynalloca.ll
new file mode 100644
index 0000000..bb43608
--- /dev/null
+++ b/test/CodeGen/X86/win64_alloca_dynalloca.ll
@@ -0,0 +1,56 @@
+; RUN: llc < %s -mtriple=x86_64-mingw32 | FileCheck %s -check-prefix=M64
+; RUN: llc < %s -mtriple=x86_64-mingw64 | FileCheck %s -check-prefix=M64
+; RUN: llc < %s -mtriple=x86_64-win32   | FileCheck %s -check-prefix=W64
+; PR8777
+; PR8778
+
+define i64 @foo(i64 %n, i64 %x) nounwind {
+entry:
+
+  %buf0 = alloca i8, i64 4096, align 1
+
+; M64: movq  %rsp, %rbp
+; M64:       $4096, %rax
+; M64: callq ___chkstk
+; M64-NOT:   %rsp
+
+; W64: movq  %rsp, %rbp
+; W64:       $4096, %rax
+; W64: callq __chkstk
+; W64: subq  $4096, %rsp
+
+  %buf1 = alloca i8, i64 %n, align 1
+
+; M64: leaq  15(%rcx), %rax
+; M64: andq  $-16, %rax
+; M64: callq ___chkstk
+; M64-NOT:   %rsp
+; M64: movq  %rsp, %rax
+
+; W64: leaq  15(%rcx), %rax
+; W64: andq  $-16, %rax
+; W64: callq __chkstk
+; W64: subq  %rax, %rsp
+; W64: movq  %rsp, %rax
+
+  %r = call i64 @bar(i64 %n, i64 %x, i64 %n, i8* %buf0, i8* %buf1) nounwind
+
+; M64: subq  $48, %rsp
+; M64: movq  %rax, 32(%rsp)
+; M64: leaq  -4096(%rbp), %r9
+; M64: callq bar
+
+; W64: subq  $48, %rsp
+; W64: movq  %rax, 32(%rsp)
+; W64: leaq  -4096(%rbp), %r9
+; W64: callq bar
+
+  ret i64 %r
+
+; M64: movq    %rbp, %rsp
+
+; W64: movq    %rbp, %rsp
+
+}
+
+declare i64 @bar(i64, i64, i64, i8* nocapture, i8* nocapture) nounwind
diff --git a/test/CodeGen/X86/win_chkstk.ll b/test/CodeGen/X86/win_chkstk.ll
index 82ce81d..ae7591d 100644
--- a/test/CodeGen/X86/win_chkstk.ll
+++ b/test/CodeGen/X86/win_chkstk.ll
@@ -16,7 +16,7 @@ entry:
 ; WIN_X32:    calll __chkstk
 ; WIN_X64:    callq __chkstk
 ; MINGW_X32:  calll __alloca
-; MINGW_X64:  callq __chkstk
+; MINGW_X64:  callq ___chkstk
 ; LINUX-NOT:  call __chkstk
   %array4096 = alloca [4096 x i8], align 16       ; <[4096 x i8]*> [#uses=0]
   ret i32 0
-- 
1.7.1.GIT

-------------- next part --------------
From 7d88296c8b870f9685f833a882a68bbf2fdad864 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Wed, 15 Dec 2010 13:07:12 +0900
Subject: [PATCH 6/9] Target/X86: Tweak va_arg for Win64 not to miss taking va_start when number of fixed args > 4.

---
 lib/Target/X86/X86ISelLowering.cpp |    8 +++++---
 test/CodeGen/X86/win64_vararg.ll   |   33 +++++++++++++++++++++++++++++++++
 2 files changed, 38 insertions(+), 3 deletions(-)

diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index a012ae2..588c3bb 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -1662,8 +1662,8 @@ X86TargetLowering::LowerFormalArguments(SDValue Chain,
   // If the function takes variable number of arguments, make a frame index for
   // the start of the first vararg value... for expansion of llvm.va_start.
   if (isVarArg) {
-    if (!IsWin64 && (Is64Bit || (CallConv != CallingConv::X86_FastCall &&
-                    CallConv != CallingConv::X86_ThisCall))) {
+    if (Is64Bit || (CallConv != CallingConv::X86_FastCall &&
+                    CallConv != CallingConv::X86_ThisCall)) {
       FuncInfo->setVarArgsFrameIndex(MFI->CreateFixedObject(1, StackSize,true));
     }
     if (Is64Bit) {
@@ -1715,7 +1715,9 @@ X86TargetLowering::LowerFormalArguments(SDValue Chain,
         int HomeOffset = TFI.getOffsetOfLocalArea() + 8;
         FuncInfo->setRegSaveFrameIndex(
           MFI->CreateFixedObject(1, NumIntRegs * 8 + HomeOffset, false));
-        FuncInfo->setVarArgsFrameIndex(FuncInfo->getRegSaveFrameIndex());
+        // FIXME: It is dirty hack but works.
+        if (NumIntRegs < 4)
+          FuncInfo->setVarArgsFrameIndex(FuncInfo->getRegSaveFrameIndex());
       } else {
         // For X86-64, if there are vararg parameters that are passed via
         // registers, then we must store them to their spots on the stack so they
diff --git a/test/CodeGen/X86/win64_vararg.ll b/test/CodeGen/X86/win64_vararg.ll
index a451318..efe8bca 100644
--- a/test/CodeGen/X86/win64_vararg.ll
+++ b/test/CodeGen/X86/win64_vararg.ll
@@ -18,3 +18,36 @@ entry:
 }
 
 declare void @llvm.va_start(i8*) nounwind
+
+; CHECK: f5:
+; CHECK: pushq
+; CHECK: leaq 56(%rsp),
+define i8* @f5(i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, ...) nounwind {
+entry:
+  %ap = alloca i8*, align 8
+  %ap1 = bitcast i8** %ap to i8*
+  call void @llvm.va_start(i8* %ap1)
+  ret i8* %ap1
+}
+
+; CHECK: f4:
+; CHECK: pushq
+; CHECK: leaq 48(%rsp),
+define i8* @f4(i64 %a0, i64 %a1, i64 %a2, i64 %a3, ...) nounwind {
+entry:
+  %ap = alloca i8*, align 8
+  %ap1 = bitcast i8** %ap to i8*
+  call void @llvm.va_start(i8* %ap1)
+  ret i8* %ap1
+}
+
+; CHECK: f3:
+; CHECK: pushq
+; CHECK: leaq 40(%rsp),
+define i8* @f3(i64 %a0, i64 %a1, i64 %a2, ...) nounwind {
+entry:
+  %ap = alloca i8*, align 8
+  %ap1 = bitcast i8** %ap to i8*
+  call void @llvm.va_start(i8* %ap1)
+  ret i8* %ap1
+}
-- 
1.7.1.GIT

-------------- next part --------------
From f1b50f241f8346d7888caacd50c9767f5afe6d4a Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Tue, 28 Dec 2010 19:05:25 +0900
Subject: [PATCH 7/9] X86FrameInfo.cpp, X86RegisterInfo.cpp: Re-indent. No functional changes.

---
 lib/Target/X86/X86FrameLowering.cpp |   35 ++++++++++++++++++++---------------
 lib/Target/X86/X86RegisterInfo.cpp  |    3 ++-
 2 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index ce88169..d636105 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -759,6 +759,12 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
   }
 
   // We're returning from function via eh_return.
+  bool isRel = (RetOpcode == X86::TCRETURNdi ||
+                RetOpcode == X86::TCRETURNdi64);
+  bool isMem = (RetOpcode == X86::TCRETURNmi ||
+                RetOpcode == X86::TCRETURNmi64);
+  bool isReg = (RetOpcode == X86::TCRETURNri ||
+                RetOpcode == X86::TCRETURNri64);
   if (RetOpcode == X86::EH_RETURN || RetOpcode == X86::EH_RETURN64) {
     MBBI = prior(MBB.end());
     MachineOperand &DestAddr  = MBBI->getOperand(0);
@@ -766,11 +772,7 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
     BuildMI(MBB, MBBI, DL,
             TII.get(Is64Bit ? X86::MOV64rr : X86::MOV32rr),
             StackPtr).addReg(DestAddr.getReg());
-  } else if (RetOpcode == X86::TCRETURNri || RetOpcode == X86::TCRETURNdi ||
-             RetOpcode == X86::TCRETURNmi ||
-             RetOpcode == X86::TCRETURNri64 || RetOpcode == X86::TCRETURNdi64 ||
-             RetOpcode == X86::TCRETURNmi64) {
-    bool isMem = RetOpcode == X86::TCRETURNmi || RetOpcode == X86::TCRETURNmi64;
+  } else if (isReg || isRel || isMem) {
     // Tail call return: adjust the stack pointer and jump to callee.
     MBBI = prior(MBB.end());
     MachineOperand &JumpTarget = MBBI->getOperand(0);
@@ -794,10 +796,11 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
     }
 
     // Jump to label or value in register.
-    if (RetOpcode == X86::TCRETURNdi || RetOpcode == X86::TCRETURNdi64) {
+    if (isRel) {
       MachineInstrBuilder MIB =
-        BuildMI(MBB, MBBI, DL, TII.get((RetOpcode == X86::TCRETURNdi)
-                                       ? X86::TAILJMPd : X86::TAILJMPd64));
+        BuildMI(MBB, MBBI, DL,
+                TII.get(RetOpcode == X86::TCRETURNdi ? X86::TAILJMPd
+                        : X86::TAILJMPd64));
       if (JumpTarget.isGlobal())
         MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(),
                              JumpTarget.getTargetFlags());
@@ -806,18 +809,20 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
         MIB.addExternalSymbol(JumpTarget.getSymbolName(),
                               JumpTarget.getTargetFlags());
       }
-    } else if (RetOpcode == X86::TCRETURNmi || RetOpcode == X86::TCRETURNmi64) {
+    } else if (isMem) {
       MachineInstrBuilder MIB =
-        BuildMI(MBB, MBBI, DL, TII.get((RetOpcode == X86::TCRETURNmi)
-                                       ? X86::TAILJMPm : X86::TAILJMPm64));
+        BuildMI(MBB, MBBI, DL,
+                TII.get(RetOpcode == X86::TCRETURNmi ? X86::TAILJMPm
+                        : X86::TAILJMPm64));
       for (unsigned i = 0; i != 5; ++i)
         MIB.addOperand(MBBI->getOperand(i));
-    } else if (RetOpcode == X86::TCRETURNri64) {
-      BuildMI(MBB, MBBI, DL, TII.get(X86::TAILJMPr64)).
+    } else if (isReg) {
+      BuildMI(MBB, MBBI, DL,
+              TII.get(RetOpcode == X86::TCRETURNri64 ? X86::TAILJMPr64
+                      : X86::TAILJMPr)).
         addReg(JumpTarget.getReg(), RegState::Kill);
     } else {
-      BuildMI(MBB, MBBI, DL, TII.get(X86::TAILJMPr)).
-        addReg(JumpTarget.getReg(), RegState::Kill);
+      llvm_unreachable("What could I select for TCRETURN?");
     }
 
     MachineInstr *NewMI = prior(MBBI);
diff --git a/lib/Target/X86/X86RegisterInfo.cpp b/lib/Target/X86/X86RegisterInfo.cpp
index 06c671b..9260f1d 100644
--- a/lib/Target/X86/X86RegisterInfo.cpp
+++ b/lib/Target/X86/X86RegisterInfo.cpp
@@ -576,7 +576,8 @@ X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
   unsigned BasePtr;
 
   unsigned Opc = MI.getOpcode();
-  bool AfterFPPop = Opc == X86::TAILJMPm64 || Opc == X86::TAILJMPm;
+  bool AfterFPPop = (Opc == X86::TAILJMPm64 ||
+                     Opc == X86::TAILJMPm);
   if (needsStackRealignment(MF))
     BasePtr = (FrameIndex < 0 ? FramePtr : StackPtr);
   else if (AfterFPPop)
-- 
1.7.1.GIT

-------------- next part --------------
From 27857943a0cbd919b1f60ac8eb562b6a01bb8ef9 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Thu, 6 Jan 2011 09:09:52 +0900
Subject: [PATCH 8/9] TableGen/EDEmitter.cpp: Add TCW64.

---
 utils/TableGen/EDEmitter.cpp |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/utils/TableGen/EDEmitter.cpp b/utils/TableGen/EDEmitter.cpp
index 7ee1019..b3e442d 100644
--- a/utils/TableGen/EDEmitter.cpp
+++ b/utils/TableGen/EDEmitter.cpp
@@ -264,6 +264,7 @@ static int X86TypeFromOpName(LiteralConstantEmitter *type,
   REG("RFP32");
   REG("GR64");
   REG("GR64_TC");
+  REG("GR64_TCW64");
   REG("FR64");
   REG("VR64");
   REG("RFP64");
@@ -297,6 +298,7 @@ static int X86TypeFromOpName(LiteralConstantEmitter *type,
   MEM("opaque48mem");
   MEM("i64mem");
   MEM("i64mem_TC");
+  MEM("i64mem_TCW64");
   MEM("f64mem");
   MEM("sdmem");
   MEM("f80mem");
-- 
1.7.1.GIT

-------------- next part --------------
From ff6e72615a2a6bf8549c76550b618d6aafdaeefd Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Thu, 6 Jan 2011 06:59:35 +0900
Subject: [PATCH 9/9] Target/X86: Tweak win64's tailcall.

---
 lib/Target/X86/X86FrameLowering.cpp |   12 ++++++++++++
 lib/Target/X86/X86InstrCompiler.td  |   24 ++++++++++++++++++++----
 lib/Target/X86/X86InstrControl.td   |   33 +++++++++++++++++++++++++++++++++
 lib/Target/X86/X86InstrInfo.cpp     |    2 ++
 lib/Target/X86/X86InstrInfo.td      |    6 ++++++
 lib/Target/X86/X86MCInstLower.cpp   |    4 ++++
 lib/Target/X86/X86RegisterInfo.cpp  |    1 +
 lib/Target/X86/X86RegisterInfo.td   |    3 +++
 test/CodeGen/X86/tailcallstack64.ll |   16 ++++++++++------
 9 files changed, 91 insertions(+), 10 deletions(-)

diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index d636105..712eded 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -108,6 +108,9 @@ static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,
   case X86::TCRETURNdi64:
   case X86::TCRETURNri64:
   case X86::TCRETURNmi64:
+  case X86::TCRETURNdiW64:
+  case X86::TCRETURNriW64:
+  case X86::TCRETURNmiW64:
   case X86::EH_RETURN:
   case X86::EH_RETURN64: {
     SmallSet<unsigned, 8> Uses;
@@ -670,6 +673,9 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
   case X86::TCRETURNdi64:
   case X86::TCRETURNri64:
   case X86::TCRETURNmi64:
+  case X86::TCRETURNdiW64:
+  case X86::TCRETURNriW64:
+  case X86::TCRETURNmiW64:
   case X86::EH_RETURN:
   case X86::EH_RETURN64:
     break;  // These are ok
@@ -760,10 +766,13 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
 
   // We're returning from function via eh_return.
   bool isRel = (RetOpcode == X86::TCRETURNdi ||
+                RetOpcode == X86::TCRETURNdiW64 ||
                 RetOpcode == X86::TCRETURNdi64);
   bool isMem = (RetOpcode == X86::TCRETURNmi ||
+                RetOpcode == X86::TCRETURNmiW64 ||
                 RetOpcode == X86::TCRETURNmi64);
   bool isReg = (RetOpcode == X86::TCRETURNri ||
+                RetOpcode == X86::TCRETURNriW64 ||
                 RetOpcode == X86::TCRETURNri64);
   if (RetOpcode == X86::EH_RETURN || RetOpcode == X86::EH_RETURN64) {
     MBBI = prior(MBB.end());
@@ -800,6 +809,7 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
       MachineInstrBuilder MIB =
         BuildMI(MBB, MBBI, DL,
                 TII.get(RetOpcode == X86::TCRETURNdi ? X86::TAILJMPd
+                        : RetOpcode == X86::TCRETURNdiW64 ? X86::TAILJMPdW64
                         : X86::TAILJMPd64));
       if (JumpTarget.isGlobal())
         MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(),
@@ -813,12 +823,14 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
       MachineInstrBuilder MIB =
         BuildMI(MBB, MBBI, DL,
                 TII.get(RetOpcode == X86::TCRETURNmi ? X86::TAILJMPm
+                        : RetOpcode == X86::TCRETURNmiW64 ? X86::TAILJMPmW64
                         : X86::TAILJMPm64));
       for (unsigned i = 0; i != 5; ++i)
         MIB.addOperand(MBBI->getOperand(i));
     } else if (isReg) {
       BuildMI(MBB, MBBI, DL,
               TII.get(RetOpcode == X86::TCRETURNri64 ? X86::TAILJMPr64
+                      : RetOpcode == X86::TCRETURNriW64 ? X86::TAILJMPrW64
                       : X86::TAILJMPr)).
         addReg(JumpTarget.getReg(), RegState::Kill);
     } else {
diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td
index d2c5763..5d55fa6 100644
--- a/lib/Target/X86/X86InstrCompiler.td
+++ b/lib/Target/X86/X86InstrCompiler.td
@@ -868,19 +868,35 @@ def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),
 
 def : Pat<(X86tcret GR64_TC:$dst, imm:$off),
           (TCRETURNri64 GR64_TC:$dst, imm:$off)>,
-          Requires<[In64BitMode]>;
+          Requires<[In64BitMode, NotWin64]>;
 
 def : Pat<(X86tcret (load addr:$dst), imm:$off),
           (TCRETURNmi64 addr:$dst, imm:$off)>,
-          Requires<[In64BitMode]>;
+          Requires<[In64BitMode, NotWin64]>;
 
 def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
           (TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
-          Requires<[In64BitMode]>;
+          Requires<[In64BitMode, NotWin64]>;
 
 def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),
           (TCRETURNdi64 texternalsym:$dst, imm:$off)>,
-          Requires<[In64BitMode]>;
+          Requires<[In64BitMode, NotWin64]>;
+
+def : Pat<(X86tcret GR64_TCW64:$dst, imm:$off),
+          (TCRETURNriW64 GR64_TCW64:$dst, imm:$off)>,
+          Requires<[IsWin64]>;
+
+def : Pat<(X86tcret (load addr:$dst), imm:$off),
+          (TCRETURNmiW64 addr:$dst, imm:$off)>,
+          Requires<[IsWin64]>;
+
+def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
+          (TCRETURNdiW64 tglobaladdr:$dst, imm:$off)>,
+          Requires<[IsWin64]>;
+
+def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),
+          (TCRETURNdiW64 texternalsym:$dst, imm:$off)>,
+          Requires<[IsWin64]>;
 
 // Normal calls, with various flavors of addresses.
 def : Pat<(X86call (i32 tglobaladdr:$dst)),
diff --git a/lib/Target/X86/X86InstrControl.td b/lib/Target/X86/X86InstrControl.td
index 31f2832..c4939a8 100644
--- a/lib/Target/X86/X86InstrControl.td
+++ b/lib/Target/X86/X86InstrControl.td
@@ -301,3 +301,36 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
   def TAILJMPm64 : I<0xFF, MRM4m, (outs), (ins i64mem_TC:$dst, variable_ops),
                      "jmp{q}\t{*}$dst  # TAILCALL", []>;
 }
+
+let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
+    isCodeGenOnly = 1 in
+  let Defs = [RAX, RCX, RDX, R8, R9, R10, R11,
+              FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, ST1,
+              MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
+              XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, EFLAGS],
+      Uses = [RSP] in {
+  def TCRETURNdiW64 : PseudoI<(outs),
+                      (ins i64i32imm_pcrel:$dst, i32imm:$offset, variable_ops),
+                      []>,
+                     Requires<[IsWin64]>;
+  def TCRETURNriW64 : PseudoI<(outs),
+                      (ins GR64_TCW64:$dst, i32imm:$offset, variable_ops), []>,
+                     Requires<[IsWin64]>;
+  let mayLoad = 1 in
+  def TCRETURNmiW64 : PseudoI<(outs),
+                       (ins i64mem_TCW64:$dst, i32imm:$offset, variable_ops), []>,
+                     Requires<[IsWin64]>;
+
+  def TAILJMPdW64 : Ii32PCRel<0xE9, RawFrm, (outs),
+                                      (ins i64i32imm_pcrel:$dst, variable_ops),
+                   "jmp\t$dst  # TAILCALL", []>,
+                     Requires<[IsWin64]>;
+  def TAILJMPrW64 : I<0xFF, MRM4r, (outs), (ins GR64_TCW64:$dst, variable_ops),
+                     "jmp{q}\t{*}$dst  # TAILCALL", []>,
+                     Requires<[IsWin64]>;
+
+  let mayLoad = 1 in
+  def TAILJMPmW64 : I<0xFF, MRM4m, (outs), (ins i64mem_TCW64:$dst, variable_ops),
+                     "jmp{q}\t{*}$dst  # TAILCALL", []>,
+                     Requires<[IsWin64]>;
+}
diff --git a/lib/Target/X86/X86InstrInfo.cpp b/lib/Target/X86/X86InstrInfo.cpp
index 63dcd14..c3c3b8c 100644
--- a/lib/Target/X86/X86InstrInfo.cpp
+++ b/lib/Target/X86/X86InstrInfo.cpp
@@ -321,6 +321,7 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
     { X86::SETSr,       X86::SETSm, 0, 0 },
     { X86::TAILJMPr,    X86::TAILJMPm, 1, 0 },
     { X86::TAILJMPr64,  X86::TAILJMPm64, 1, 0 },
+    { X86::TAILJMPrW64, X86::TAILJMPmW64, 1, 0 },
     { X86::TEST16ri,    X86::TEST16mi, 1, 0 },
     { X86::TEST32ri,    X86::TEST32mi, 1, 0 },
     { X86::TEST64ri32,  X86::TEST64mi32, 1, 0 },
@@ -2025,6 +2026,7 @@ static unsigned getLoadStoreRegOpcode(unsigned Reg,
   case X86::GR64_NOREX_NOSPRegClassID:
   case X86::GR64_NOSPRegClassID:
   case X86::GR64_TCRegClassID:
+  case X86::GR64_TCW64RegClassID:
     return load ? X86::MOV64rm : X86::MOV64mr;
   case X86::GR32RegClassID:
   case X86::GR32_ABCDRegClassID:
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 4748f13..78847a8 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -291,6 +291,12 @@ def i64mem_TC : Operand<i64> {
   let ParserMatchClass = X86MemAsmOperand;
 }
 
+def i64mem_TCW64 : Operand<i64> {
+  let PrintMethod = "printi64mem";
+  let MIOperandInfo = (ops GR64_TCW64, i8imm, GR64_TCW64, i32imm, i8imm);
+  let ParserMatchClass = X86MemAsmOperand;
+}
+
 let ParserMatchClass = X86AbsMemAsmOperand,
     PrintMethod = "print_pcrel_imm" in {
 def i32imm_pcrel : Operand<i32>;
diff --git a/lib/Target/X86/X86MCInstLower.cpp b/lib/Target/X86/X86MCInstLower.cpp
index 4159af1..0b9679e 100644
--- a/lib/Target/X86/X86MCInstLower.cpp
+++ b/lib/Target/X86/X86MCInstLower.cpp
@@ -399,6 +399,7 @@ ReSimplify:
   // register inputs modeled as normal uses instead of implicit uses.  As such,
   // truncate off all but the first operand (the callee).  FIXME: Change isel.
   case X86::TAILJMPr64:
+  case X86::TAILJMPrW64:
   case X86::CALL64r:
   case X86::CALL64pcrel32:
   case X86::WINCALL64r:
@@ -421,12 +422,14 @@ ReSimplify:
   // TAILJMPd, TAILJMPd64 - Lower to the correct jump instructions.
   case X86::TAILJMPr:
   case X86::TAILJMPd:
+  case X86::TAILJMPdW64:
   case X86::TAILJMPd64: {
     unsigned Opcode;
     switch (OutMI.getOpcode()) {
     default: assert(0 && "Invalid opcode");
     case X86::TAILJMPr: Opcode = X86::JMP32r; break;
     case X86::TAILJMPd:
+    case X86::TAILJMPdW64:
     case X86::TAILJMPd64: Opcode = X86::JMP_1; break;
     }
 
@@ -618,6 +621,7 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
   case X86::TAILJMPr:
   case X86::TAILJMPd:
   case X86::TAILJMPd64:
+  case X86::TAILJMPdW64:
     // Lower these as normal, but add some comments.
     OutStreamer.AddComment("TAILCALL");
     break;
diff --git a/lib/Target/X86/X86RegisterInfo.cpp b/lib/Target/X86/X86RegisterInfo.cpp
index 9260f1d..e1eaa4b 100644
--- a/lib/Target/X86/X86RegisterInfo.cpp
+++ b/lib/Target/X86/X86RegisterInfo.cpp
@@ -577,6 +577,7 @@ X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
 
   unsigned Opc = MI.getOpcode();
   bool AfterFPPop = (Opc == X86::TAILJMPm64 ||
+                     Opc == X86::TAILJMPmW64 ||
                      Opc == X86::TAILJMPm);
   if (needsStackRealignment(MF))
     BasePtr = (FrameIndex < 0 ? FramePtr : StackPtr);
diff --git a/lib/Target/X86/X86RegisterInfo.td b/lib/Target/X86/X86RegisterInfo.td
index 45bb989..dd35d62 100644
--- a/lib/Target/X86/X86RegisterInfo.td
+++ b/lib/Target/X86/X86RegisterInfo.td
@@ -496,6 +496,9 @@ def GR64_TC   : RegisterClass<"X86", [i64], 64, [RAX, RCX, RDX, RSI, RDI,
                        (GR32_TC sub_32bit)];
 }
 
+def GR64_TCW64   : RegisterClass<"X86", [i64], 64, [RAX, R11, R10,
+                                                    R9, R8, RDX, RCX]>;
+
 // GR8_NOREX - GR8 registers which do not require a REX prefix.
 def GR8_NOREX : RegisterClass<"X86", [i8], 8,
                               [AL, CL, DL, AH, CH, DH, BL, BH]> {
diff --git a/test/CodeGen/X86/tailcallstack64.ll b/test/CodeGen/X86/tailcallstack64.ll
index 52b074d..0c732d5 100644
--- a/test/CodeGen/X86/tailcallstack64.ll
+++ b/test/CodeGen/X86/tailcallstack64.ll
@@ -1,16 +1,20 @@
-; RUN: llc < %s -tailcallopt -march=x86-64 -post-RA-scheduler=true | FileCheck %s
+; RUN: llc < %s -tailcallopt -mtriple=x86_64-linux -post-RA-scheduler=true | FileCheck %s
+; RUN: llc < %s -tailcallopt -mtriple=x86_64-win32 -post-RA-scheduler=true | FileCheck %s
+
+; FIXME: Redundant unused stack allocation could be eliminated.
+; CHECK: subq  ${{24|88}}, %rsp
 
 ; Check that lowered arguments on the stack do not overwrite each other.
 ; Add %in1 %p1 to a different temporary register (%eax).
-; CHECK: movl  32(%rsp), %eax
+; CHECK: movl  [[A1:32|144]](%rsp), %eax
 ; Move param %in1 to temp register (%r10d).
-; CHECK: movl  40(%rsp), %r10d
+; CHECK: movl  [[A2:40|152]](%rsp), %r10d
 ; Add %in1 %p1 to a different temporary register (%eax).
-; CHECK: addl %edi, %eax
+; CHECK: addl {{%edi|%ecx}}, %eax
 ; Move param %in2 to stack.
-; CHECK: movl  %r10d, 32(%rsp)
+; CHECK: movl  %r10d, [[A1]](%rsp)
 ; Move result of addition to stack.
-; CHECK: movl  %eax, 40(%rsp)
+; CHECK: movl  %eax, [[A2]](%rsp)
 ; Eventually, do a TAILCALL
 ; CHECK: TAILCALL
 
-- 
1.7.1.GIT