[llvm-commits] [Review request] Tweaking Win64 Codegen
NAKAMURA Takumi
geek4civic at gmail.com
Mon Jan 10 23:44:05 PST 2011
Hello, guys!
I can build Win64 clang (by w64-clang selfhost) with my local patches.
For my attempt, I found some issues on llvm;
- stack allocation
- shadow area
- allocating stack beyond page boundary
- allocating variable alloca(n)
- varargs (caller and callee)
- tailcall
Essential patches are here;
They just work, though, I am dubious of their style.
Please please let me know any feedbacks.
Next, I will propose w64-clang patches and 57 of CodeGen/X86 tests.
Thank you, ...Takumi
* 0001-Target-X86-Fix-whitespace.patch.txt
* 0002-test-CodeGen-X86-Fix-whitespace.patch.txt
Cosmetic changes.
* 0003-lib-Target-X86-X86ISelLowering.cpp-Introduce-a-n.patch.txt
No functional changes. for 0004.
It introduces a new variable "IsWin64" instead of Subtarget->isWin64().
* 0004-Target-X86-Tweak-allocating-shadow-area-aka-home.patch.txt
Let caller provide at least 4 x i64 allocation for callee(s).
On leaf functions, shadow area is not allocated, to save stack usage.
FIXME: I wonder if CCState had knowledge for minimum stack allocation.
It resolves also PR8922, though, emitted code would not be optimal.
* 0005-Target-X86-Tweak-alloca-and-add-a-testcase-for-m.patch.txt
[PR8777][PR8778][PR8919]
Introduce W64ALLOCA and emit one for w64.
It reverts PR8919 and r122934.
I assume mingw64 distro and FSF sources.
* 0006-Target-X86-Tweak-va_arg-for-Win64-not-to-miss-ta.patch.txt
When fixed args > 4, va_ptr(ap) would be missed.
It just works but must be dirty, I think.
* 0007-X86FrameInfo.cpp-X86RegisterInfo.cpp-Re-indent.-.patch.txt
Cosmetic changes to apply 0009 easily. No functional changes.
* 0008-TableGen-EDEmitter.cpp-Add-TCW64.patch.txt
Let tablegen recognize new class GR_TCW64.
* 0009-Target-X86-Tweak-win64-s-tailcall.patch.txt
[PR8743] Introduce tailcall-w64 stuff.
FIXME: eligibility of w64's tailcall might be loosen.
-------------- next part --------------
From 949c925274e0e8a40238553c638335fd330c0c96 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Thu, 9 Dec 2010 20:15:03 +0900
Subject: [PATCH 1/9] Target/X86: Fix whitespace.
---
lib/Target/X86/X86FrameLowering.cpp | 2 +-
lib/Target/X86/X86ISelLowering.cpp | 94 +++++++++++++++++-----------------
lib/Target/X86/X86InstrCompiler.td | 17 +++---
lib/Target/X86/X86InstrControl.td | 39 +++++++-------
lib/Target/X86/X86InstrInfo.cpp | 84 +++++++++++++++---------------
lib/Target/X86/X86InstrInfo.td | 3 +-
lib/Target/X86/X86MCInstLower.cpp | 65 ++++++++++++------------
lib/Target/X86/X86RegisterInfo.cpp | 6 +-
lib/Target/X86/X86RegisterInfo.td | 24 +++++-----
9 files changed, 165 insertions(+), 169 deletions(-)
diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index 7c7b4f3..cbf1b59 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -321,7 +321,7 @@ void X86FrameLowering::emitCalleeSavedFrameMoves(MachineFunction &MF,
// move" for this extra "PUSH", the linker will lose track of the fact that
// the frame pointer should have the value of the first "PUSH" when it's
// trying to unwind.
- //
+ //
// FIXME: This looks inelegant. It's possibly correct, but it's covering up
// another bug. I.e., one where we generate a prolog like this:
//
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index 1a4bb97..2268823 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -69,7 +69,7 @@ static TargetLoweringObjectFile *createTLOF(X86TargetMachine &TM) {
return new X8664_MachoTargetObjectFile();
return new TargetLoweringObjectFileMachO();
}
-
+
if (TM.getSubtarget<X86Subtarget>().isTargetELF() ){
if (is64Bit)
return new X8664_ELFTargetObjectFile(TM);
@@ -256,7 +256,7 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
setOperationAction(ISD::UDIV, VT, Expand);
setOperationAction(ISD::SREM, VT, Expand);
setOperationAction(ISD::UREM, VT, Expand);
-
+
// Add/Sub overflow ops with MVT::Glues are lowered to EFLAGS dependences.
setOperationAction(ISD::ADDC, VT, Custom);
setOperationAction(ISD::ADDE, VT, Custom);
@@ -369,7 +369,7 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
setOperationAction(ISD::ATOMIC_CMP_SWAP, VT, Custom);
setOperationAction(ISD::ATOMIC_LOAD_SUB, VT, Custom);
}
-
+
if (!Subtarget->is64Bit()) {
setOperationAction(ISD::ATOMIC_LOAD_ADD, MVT::i64, Custom);
setOperationAction(ISD::ATOMIC_LOAD_SUB, MVT::i64, Custom);
@@ -931,7 +931,7 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
// We want to custom lower some of our intrinsics.
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
-
+
// Only custom-lower 64-bit SADDO and friends on 64-bit because we don't
// handle type legalization for these operations here.
//
@@ -948,7 +948,7 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
setOperationAction(ISD::SMULO, VT, Custom);
setOperationAction(ISD::UMULO, VT, Custom);
}
-
+
// There are no 8-bit 3-address imul/mul instructions
setOperationAction(ISD::SMULO, MVT::i8, Expand);
setOperationAction(ISD::UMULO, MVT::i8, Expand);
@@ -6198,7 +6198,7 @@ X86TargetLowering::LowerGlobalTLSAddress(SDValue Op, SelectionDAG &DAG) const {
// TLSCALL will be codegen'ed as call. Inform MFI that function has calls.
MachineFrameInfo *MFI = DAG.getMachineFunction().getFrameInfo();
MFI->setAdjustsStack(true);
-
+
// And our return value (tls address) is in the standard call return value
// location.
unsigned Reg = Subtarget->is64Bit() ? X86::RAX : X86::EAX;
@@ -7047,7 +7047,7 @@ SDValue X86TargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const {
(cast<ConstantSDNode>(Op1)->getZExtValue() == 1 ||
cast<ConstantSDNode>(Op1)->isNullValue()) &&
(CC == ISD::SETEQ || CC == ISD::SETNE)) {
-
+
// If the input is a setcc, then reuse the input setcc or use a new one with
// the inverted condition.
if (Op0.getOpcode() == X86ISD::SETCC) {
@@ -7055,7 +7055,7 @@ SDValue X86TargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const {
bool Invert = (CC == ISD::SETNE) ^
cast<ConstantSDNode>(Op1)->isNullValue();
if (!Invert) return Op0;
-
+
CCode = X86::GetOppositeBranchCondition(CCode);
return DAG.getNode(X86ISD::SETCC, dl, MVT::i8,
DAG.getConstant(CCode, MVT::i8), Op0.getOperand(1));
@@ -7206,7 +7206,7 @@ static bool isX86LogicalCmp(SDValue Op) {
if (Op.getResNo() == 2 && Opc == X86ISD::UMUL)
return true;
-
+
return false;
}
@@ -7242,24 +7242,24 @@ SDValue X86TargetLowering::LowerSELECT(SDValue Op, SelectionDAG &DAG) const {
Cond.getOperand(1).getOpcode() == X86ISD::CMP &&
isZero(Cond.getOperand(1).getOperand(1))) {
SDValue Cmp = Cond.getOperand(1);
-
+
unsigned CondCode =cast<ConstantSDNode>(Cond.getOperand(0))->getZExtValue();
-
- if ((isAllOnes(Op1) || isAllOnes(Op2)) &&
+
+ if ((isAllOnes(Op1) || isAllOnes(Op2)) &&
(CondCode == X86::COND_E || CondCode == X86::COND_NE)) {
SDValue Y = isAllOnes(Op2) ? Op1 : Op2;
SDValue CmpOp0 = Cmp.getOperand(0);
Cmp = DAG.getNode(X86ISD::CMP, DL, MVT::i32,
CmpOp0, DAG.getConstant(1, CmpOp0.getValueType()));
-
+
SDValue Res = // Res = 0 or -1.
DAG.getNode(X86ISD::SETCC_CARRY, DL, Op.getValueType(),
DAG.getConstant(X86::COND_B, MVT::i8), Cmp);
-
+
if (isAllOnes(Op1) != (CondCode == X86::COND_E))
Res = DAG.getNOT(DL, Res, Res.getValueType());
-
+
ConstantSDNode *N2C = dyn_cast<ConstantSDNode>(Op2);
if (N2C == 0 || !N2C->isNullValue())
Res = DAG.getNode(ISD::OR, DL, Res.getValueType(), Res, Y);
@@ -8443,7 +8443,7 @@ SDValue X86TargetLowering::LowerSHL(SDValue Op, SelectionDAG &DAG) const {
Op = DAG.getNode(ISD::ADD, dl, VT, Op, Op);
// return pblendv(r, r+r, a);
- R = DAG.getNode(X86ISD::PBLENDVB, dl, VT,
+ R = DAG.getNode(X86ISD::PBLENDVB, dl, VT,
R, DAG.getNode(ISD::ADD, dl, VT, R, R), Op);
return R;
}
@@ -8503,12 +8503,12 @@ SDValue X86TargetLowering::LowerXALUO(SDValue Op, SelectionDAG &DAG) const {
SDVTList VTs = DAG.getVTList(N->getValueType(0), N->getValueType(0),
MVT::i32);
SDValue Sum = DAG.getNode(X86ISD::UMUL, DL, VTs, LHS, RHS);
-
+
SDValue SetCC =
DAG.getNode(X86ISD::SETCC, DL, MVT::i8,
DAG.getConstant(X86::COND_O, MVT::i32),
SDValue(Sum.getNode(), 2));
-
+
DAG.ReplaceAllUsesOfValueWith(SDValue(N, 1), SetCC);
return Sum;
}
@@ -8663,9 +8663,9 @@ static SDValue LowerADDC_ADDE_SUBC_SUBE(SDValue Op, SelectionDAG &DAG) {
// Let legalize expand this if it isn't a legal type yet.
if (!DAG.getTargetLoweringInfo().isTypeLegal(VT))
return SDValue();
-
+
SDVTList VTs = DAG.getVTList(VT, MVT::i32);
-
+
unsigned Opc;
bool ExtraOp = false;
switch (Op.getOpcode()) {
@@ -8675,7 +8675,7 @@ static SDValue LowerADDC_ADDE_SUBC_SUBE(SDValue Op, SelectionDAG &DAG) {
case ISD::SUBC: Opc = X86ISD::SUB; break;
case ISD::SUBE: Opc = X86ISD::SBB; ExtraOp = true; break;
}
-
+
if (!ExtraOp)
return DAG.getNode(Opc, Op->getDebugLoc(), VTs, Op.getOperand(0),
Op.getOperand(1));
@@ -9555,14 +9555,14 @@ MachineBasicBlock *
X86TargetLowering::EmitMonitor(MachineInstr *MI, MachineBasicBlock *BB) const {
DebugLoc dl = MI->getDebugLoc();
const TargetInstrInfo *TII = getTargetMachine().getInstrInfo();
-
+
// Address into RAX/EAX, other two args into ECX, EDX.
unsigned MemOpc = Subtarget->is64Bit() ? X86::LEA64r : X86::LEA32r;
unsigned MemReg = Subtarget->is64Bit() ? X86::RAX : X86::EAX;
MachineInstrBuilder MIB = BuildMI(*BB, MI, dl, TII->get(MemOpc), MemReg);
for (int i = 0; i < X86::AddrNumOperands; ++i)
MIB.addOperand(MI->getOperand(i));
-
+
unsigned ValOps = X86::AddrNumOperands;
BuildMI(*BB, MI, dl, TII->get(TargetOpcode::COPY), X86::ECX)
.addReg(MI->getOperand(ValOps).getReg());
@@ -9571,7 +9571,7 @@ X86TargetLowering::EmitMonitor(MachineInstr *MI, MachineBasicBlock *BB) const {
// The instruction doesn't actually take any operands though.
BuildMI(*BB, MI, dl, TII->get(X86::MONITORrrr));
-
+
MI->eraseFromParent(); // The pseudo is gone now.
return BB;
}
@@ -9580,16 +9580,16 @@ MachineBasicBlock *
X86TargetLowering::EmitMwait(MachineInstr *MI, MachineBasicBlock *BB) const {
DebugLoc dl = MI->getDebugLoc();
const TargetInstrInfo *TII = getTargetMachine().getInstrInfo();
-
+
// First arg in ECX, the second in EAX.
BuildMI(*BB, MI, dl, TII->get(TargetOpcode::COPY), X86::ECX)
.addReg(MI->getOperand(0).getReg());
BuildMI(*BB, MI, dl, TII->get(TargetOpcode::COPY), X86::EAX)
.addReg(MI->getOperand(1).getReg());
-
+
// The instruction doesn't actually take any operands though.
BuildMI(*BB, MI, dl, TII->get(X86::MWAITrr));
-
+
MI->eraseFromParent(); // The pseudo is gone now.
return BB;
}
@@ -10195,7 +10195,7 @@ X86TargetLowering::EmitInstrWithCustomInserter(MachineInstr *MI,
// Thread synchronization.
case X86::MONITOR:
- return EmitMonitor(MI, BB);
+ return EmitMonitor(MI, BB);
case X86::MWAIT:
return EmitMwait(MI, BB);
@@ -11116,19 +11116,19 @@ static SDValue PerformAndCombine(SDNode *N, SelectionDAG &DAG,
const X86Subtarget *Subtarget) {
if (DCI.isBeforeLegalizeOps())
return SDValue();
-
+
// Want to form PANDN nodes, in the hopes of then easily combining them with
// OR and AND nodes to form PBLEND/PSIGN.
EVT VT = N->getValueType(0);
if (VT != MVT::v2i64)
return SDValue();
-
+
SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);
DebugLoc DL = N->getDebugLoc();
-
+
// Check LHS for vnot
- if (N0.getOpcode() == ISD::XOR &&
+ if (N0.getOpcode() == ISD::XOR &&
ISD::isBuildVectorAllOnes(N0.getOperand(1).getNode()))
return DAG.getNode(X86ISD::PANDN, DL, VT, N0.getOperand(0), N1);
@@ -11136,7 +11136,7 @@ static SDValue PerformAndCombine(SDNode *N, SelectionDAG &DAG,
if (N1.getOpcode() == ISD::XOR &&
ISD::isBuildVectorAllOnes(N1.getOperand(1).getNode()))
return DAG.getNode(X86ISD::PANDN, DL, VT, N1.getOperand(0), N0);
-
+
return SDValue();
}
@@ -11152,7 +11152,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
SDValue N0 = N->getOperand(0);
SDValue N1 = N->getOperand(1);
-
+
// look for psign/blend
if (Subtarget->hasSSSE3()) {
if (VT == MVT::v2i64) {
@@ -11168,17 +11168,17 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
Y = N0.getOperand(1);
if (N0.getOperand(1) == Mask)
Y = N0.getOperand(0);
-
+
// Check to see if the mask appeared in both the AND and PANDN and
if (!Y.getNode())
return SDValue();
-
+
// Validate that X, Y, and Mask are BIT_CONVERTS, and see through them.
if (Mask.getOpcode() != ISD::BITCAST ||
X.getOpcode() != ISD::BITCAST ||
Y.getOpcode() != ISD::BITCAST)
return SDValue();
-
+
// Look through mask bitcast.
Mask = Mask.getOperand(0);
EVT MaskVT = Mask.getValueType();
@@ -11187,7 +11187,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
// will be an intrinsic.
if (Mask.getOpcode() != ISD::INTRINSIC_WO_CHAIN)
return SDValue();
-
+
// FIXME: what to do for bytes, since there is a psignb/pblendvb, but
// there is no psrai.b
switch (cast<ConstantSDNode>(Mask.getOperand(0))->getZExtValue()) {
@@ -11196,14 +11196,14 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
break;
default: return SDValue();
}
-
+
// Check that the SRA is all signbits.
SDValue SraC = Mask.getOperand(2);
unsigned SraAmt = cast<ConstantSDNode>(SraC)->getZExtValue();
unsigned EltBits = MaskVT.getVectorElementType().getSizeInBits();
if ((SraAmt + 1) != EltBits)
return SDValue();
-
+
DebugLoc DL = N->getDebugLoc();
// Now we know we at least have a plendvb with the mask val. See if
@@ -11229,7 +11229,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
// PBLENDVB only available on SSE 4.1
if (!Subtarget->hasSSE41())
return SDValue();
-
+
X = DAG.getNode(ISD::BITCAST, DL, MVT::v16i8, X);
Y = DAG.getNode(ISD::BITCAST, DL, MVT::v16i8, Y);
Mask = DAG.getNode(ISD::BITCAST, DL, MVT::v16i8, Mask);
@@ -11238,7 +11238,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
}
}
}
-
+
// fold (or (x << c) | (y >> (64 - c))) ==> (shld64 x, y, c)
if (N0.getOpcode() == ISD::SRL && N1.getOpcode() == ISD::SHL)
std::swap(N0, N1);
@@ -11290,7 +11290,7 @@ static SDValue PerformOrCombine(SDNode *N, SelectionDAG &DAG,
DAG.getNode(ISD::TRUNCATE, DL,
MVT::i8, ShAmt0));
}
-
+
return SDValue();
}
@@ -11500,7 +11500,7 @@ static SDValue PerformSETCCCombine(SDNode *N, SelectionDAG &DAG) {
unsigned X86CC = N->getConstantOperandVal(0);
SDValue EFLAG = N->getOperand(1);
DebugLoc DL = N->getDebugLoc();
-
+
// Materialize "setb reg" as "sbb reg,reg", since it can be extended without
// a zext and produces an all-ones bit which is more useful than 0/1 in some
// cases.
@@ -11509,10 +11509,10 @@ static SDValue PerformSETCCCombine(SDNode *N, SelectionDAG &DAG) {
DAG.getNode(X86ISD::SETCC_CARRY, DL, MVT::i8,
DAG.getConstant(X86CC, MVT::i8), EFLAG),
DAG.getConstant(1, MVT::i8));
-
+
return SDValue();
}
-
+
// Optimize RES, EFLAGS = X86ISD::ADC LHS, RHS, EFLAGS
static SDValue PerformADCCombine(SDNode *N, SelectionDAG &DAG,
X86TargetLowering::DAGCombinerInfo &DCI) {
@@ -11544,7 +11544,7 @@ static SDValue PerformADCCombine(SDNode *N, SelectionDAG &DAG,
// (sub (setne X, 0), Y) -> adc -1, Y
static SDValue OptimizeConditonalInDecrement(SDNode *N, SelectionDAG &DAG) {
DebugLoc DL = N->getDebugLoc();
-
+
// Look through ZExts.
SDValue Ext = N->getOperand(N->getOpcode() == ISD::SUB ? 1 : 0);
if (Ext.getOpcode() != ISD::ZERO_EXTEND || !Ext.hasOneUse())
diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td
index da5e05a..d2c5763 100644
--- a/lib/Target/X86/X86InstrCompiler.td
+++ b/lib/Target/X86/X86InstrCompiler.td
@@ -849,38 +849,38 @@ def : Pat<(X86call (i64 texternalsym:$dst)),
// tailcall stuff
def : Pat<(X86tcret GR32_TC:$dst, imm:$off),
(TCRETURNri GR32_TC:$dst, imm:$off)>,
- Requires<[In32BitMode]>;
+ Requires<[In32BitMode]>;
// FIXME: This is disabled for 32-bit PIC mode because the global base
// register which is part of the address mode may be assigned a
// callee-saved register.
def : Pat<(X86tcret (load addr:$dst), imm:$off),
(TCRETURNmi addr:$dst, imm:$off)>,
- Requires<[In32BitMode, IsNotPIC]>;
+ Requires<[In32BitMode, IsNotPIC]>;
def : Pat<(X86tcret (i32 tglobaladdr:$dst), imm:$off),
(TCRETURNdi texternalsym:$dst, imm:$off)>,
- Requires<[In32BitMode]>;
+ Requires<[In32BitMode]>;
def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),
(TCRETURNdi texternalsym:$dst, imm:$off)>,
- Requires<[In32BitMode]>;
+ Requires<[In32BitMode]>;
def : Pat<(X86tcret GR64_TC:$dst, imm:$off),
(TCRETURNri64 GR64_TC:$dst, imm:$off)>,
- Requires<[In64BitMode]>;
+ Requires<[In64BitMode]>;
def : Pat<(X86tcret (load addr:$dst), imm:$off),
(TCRETURNmi64 addr:$dst, imm:$off)>,
- Requires<[In64BitMode]>;
+ Requires<[In64BitMode]>;
def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
(TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
- Requires<[In64BitMode]>;
+ Requires<[In64BitMode]>;
def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),
(TCRETURNdi64 texternalsym:$dst, imm:$off)>,
- Requires<[In64BitMode]>;
+ Requires<[In64BitMode]>;
// Normal calls, with various flavors of addresses.
def : Pat<(X86call (i32 tglobaladdr:$dst)),
@@ -1661,4 +1661,3 @@ def : Pat<(and GR64:$src1, i64immSExt8:$src2),
(AND64ri8 GR64:$src1, i64immSExt8:$src2)>;
def : Pat<(and GR64:$src1, i64immSExt32:$src2),
(AND64ri32 GR64:$src1, i64immSExt32:$src2)>;
-
diff --git a/lib/Target/X86/X86InstrControl.td b/lib/Target/X86/X86InstrControl.td
index 62ab53e..4d1c5f7 100644
--- a/lib/Target/X86/X86InstrControl.td
+++ b/lib/Target/X86/X86InstrControl.td
@@ -1,10 +1,10 @@
//===- X86InstrControl.td - Control Flow Instructions ------*- tablegen -*-===//
-//
+//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
-//
+//
//===----------------------------------------------------------------------===//
//
// This file describes the X86 jump, return, call, and related instructions.
@@ -43,7 +43,7 @@ let isBarrier = 1, isBranch = 1, isTerminator = 1 in {
"jmp\t$dst", [(br bb:$dst)]>;
def JMP_1 : Ii8PCRel<0xEB, RawFrm, (outs), (ins brtarget8:$dst),
"jmp\t$dst", []>;
- def JMP64pcrel32 : I<0xE9, RawFrm, (outs), (ins brtarget:$dst),
+ def JMP64pcrel32 : I<0xE9, RawFrm, (outs), (ins brtarget:$dst),
"jmp{q}\t$dst", []>;
}
@@ -108,16 +108,16 @@ let isBranch = 1, isTerminator = 1, isBarrier = 1, isIndirectBranch = 1 in {
def JMP64m : I<0xFF, MRM4m, (outs), (ins i64mem:$dst), "jmp{q}\t{*}$dst",
[(brind (loadi64 addr:$dst))]>, Requires<[In64BitMode]>;
- def FARJMP16i : Iseg16<0xEA, RawFrmImm16, (outs),
+ def FARJMP16i : Iseg16<0xEA, RawFrmImm16, (outs),
(ins i16imm:$off, i16imm:$seg),
"ljmp{w}\t{$seg, $off|$off, $seg}", []>, OpSize;
def FARJMP32i : Iseg32<0xEA, RawFrmImm16, (outs),
(ins i32imm:$off, i16imm:$seg),
- "ljmp{l}\t{$seg, $off|$off, $seg}", []>;
+ "ljmp{l}\t{$seg, $off|$off, $seg}", []>;
def FARJMP64 : RI<0xFF, MRM5m, (outs), (ins opaque80mem:$dst),
"ljmp{q}\t{*}$dst", []>;
- def FARJMP16m : I<0xFF, MRM5m, (outs), (ins opaque32mem:$dst),
+ def FARJMP16m : I<0xFF, MRM5m, (outs), (ins opaque32mem:$dst),
"ljmp{w}\t{*}$dst", []>, OpSize;
def FARJMP32m : I<0xFF, MRM5m, (outs), (ins opaque48mem:$dst),
"ljmp{l}\t{*}$dst", []>;
@@ -152,14 +152,14 @@ let isCall = 1 in
def CALL32m : I<0xFF, MRM2m, (outs), (ins i32mem:$dst, variable_ops),
"call{l}\t{*}$dst", [(X86call (loadi32 addr:$dst))]>,
Requires<[In32BitMode]>;
-
- def FARCALL16i : Iseg16<0x9A, RawFrmImm16, (outs),
+
+ def FARCALL16i : Iseg16<0x9A, RawFrmImm16, (outs),
(ins i16imm:$off, i16imm:$seg),
"lcall{w}\t{$seg, $off|$off, $seg}", []>, OpSize;
def FARCALL32i : Iseg32<0x9A, RawFrmImm16, (outs),
(ins i32imm:$off, i16imm:$seg),
"lcall{l}\t{$seg, $off|$off, $seg}", []>;
-
+
def FARCALL16m : I<0xFF, MRM3m, (outs), (ins opaque32mem:$dst),
"lcall{w}\t{*}$dst", []>, OpSize;
def FARCALL32m : I<0xFF, MRM3m, (outs), (ins opaque48mem:$dst),
@@ -182,12 +182,12 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
Uses = [ESP] in {
- def TCRETURNdi : PseudoI<(outs),
+ def TCRETURNdi : PseudoI<(outs),
(ins i32imm_pcrel:$dst, i32imm:$offset, variable_ops), []>;
- def TCRETURNri : PseudoI<(outs),
+ def TCRETURNri : PseudoI<(outs),
(ins GR32_TC:$dst, i32imm:$offset, variable_ops), []>;
let mayLoad = 1 in
- def TCRETURNmi : PseudoI<(outs),
+ def TCRETURNmi : PseudoI<(outs),
(ins i32mem_TC:$dst, i32imm:$offset, variable_ops), []>;
// FIXME: The should be pseudo instructions that are lowered when going to
@@ -196,7 +196,7 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
(ins i32imm_pcrel:$dst, variable_ops),
"jmp\t$dst # TAILCALL",
[]>;
- def TAILJMPr : I<0xFF, MRM4r, (outs), (ins GR32_TC:$dst, variable_ops),
+ def TAILJMPr : I<0xFF, MRM4r, (outs), (ins GR32_TC:$dst, variable_ops),
"", []>; // FIXME: Remove encoding when JIT is dead.
let mayLoad = 1 in
def TAILJMPm : I<0xFF, MRM4m, (outs), (ins i32mem_TC:$dst, variable_ops),
@@ -218,7 +218,7 @@ let isCall = 1 in
XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, XMM6, XMM7,
XMM8, XMM9, XMM10, XMM11, XMM12, XMM13, XMM14, XMM15, EFLAGS],
Uses = [RSP] in {
-
+
// NOTE: this pattern doesn't match "X86call imm", because we do not know
// that the offset between an arbitrary immediate and the call will fit in
// the 32-bit pcrel field that we have.
@@ -232,12 +232,12 @@ let isCall = 1 in
def CALL64m : I<0xFF, MRM2m, (outs), (ins i64mem:$dst, variable_ops),
"call{q}\t{*}$dst", [(X86call (loadi64 addr:$dst))]>,
Requires<[In64BitMode, NotWin64]>;
-
+
def FARCALL64 : RI<0xFF, MRM3m, (outs), (ins opaque80mem:$dst),
"lcall{q}\t{*}$dst", []>;
}
- // FIXME: We need to teach codegen about single list of call-clobbered
+ // FIXME: We need to teach codegen about single list of call-clobbered
// registers.
let isCall = 1, isCodeGenOnly = 1 in
// All calls clobber the non-callee saved registers. RSP is marked as
@@ -256,10 +256,10 @@ let isCall = 1, isCodeGenOnly = 1 in
def WINCALL64r : I<0xFF, MRM2r, (outs), (ins GR64:$dst, variable_ops),
"call{q}\t{*}$dst",
[(X86call GR64:$dst)]>, Requires<[IsWin64]>;
- def WINCALL64m : I<0xFF, MRM2m, (outs),
+ def WINCALL64m : I<0xFF, MRM2m, (outs),
(ins i64mem:$dst,variable_ops),
"call{q}\t{*}$dst",
- [(X86call (loadi64 addr:$dst))]>,
+ [(X86call (loadi64 addr:$dst))]>,
Requires<[IsWin64]>;
}
@@ -278,7 +278,7 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
def TCRETURNri64 : PseudoI<(outs),
(ins GR64_TC:$dst, i32imm:$offset, variable_ops), []>;
let mayLoad = 1 in
- def TCRETURNmi64 : PseudoI<(outs),
+ def TCRETURNmi64 : PseudoI<(outs),
(ins i64mem_TC:$dst, i32imm:$offset, variable_ops), []>;
def TAILJMPd64 : Ii32PCRel<0xE9, RawFrm, (outs),
@@ -291,4 +291,3 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
def TAILJMPm64 : I<0xFF, MRM4m, (outs), (ins i64mem_TC:$dst, variable_ops),
"jmp{q}\t{*}$dst # TAILCALL", []>;
}
-
diff --git a/lib/Target/X86/X86InstrInfo.cpp b/lib/Target/X86/X86InstrInfo.cpp
index 73654d3..63dcd14 100644
--- a/lib/Target/X86/X86InstrInfo.cpp
+++ b/lib/Target/X86/X86InstrInfo.cpp
@@ -58,7 +58,7 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
TB_NOT_REVERSABLE = 1U << 31,
TB_FLAGS = TB_NOT_REVERSABLE
};
-
+
static const unsigned OpTbl2Addr[][2] = {
{ X86::ADC32ri, X86::ADC32mi },
{ X86::ADC32ri8, X86::ADC32mi8 },
@@ -231,16 +231,16 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
unsigned MemOp = OpTbl2Addr[i][1] & ~TB_FLAGS;
assert(!RegOp2MemOpTable2Addr.count(RegOp) && "Duplicated entries?");
RegOp2MemOpTable2Addr[RegOp] = std::make_pair(MemOp, 0U);
-
+
// If this is not a reversable operation (because there is a many->one)
// mapping, don't insert the reverse of the operation into MemOp2RegOpTable.
if (OpTbl2Addr[i][1] & TB_NOT_REVERSABLE)
continue;
-
+
// Index 0, folded load and store, no alignment requirement.
unsigned AuxInfo = 0 | (1 << 4) | (1 << 5);
-
- assert(!MemOp2RegOpTable.count(MemOp) &&
+
+ assert(!MemOp2RegOpTable.count(MemOp) &&
"Duplicated entries in unfolding maps?");
MemOp2RegOpTable[MemOp] = std::make_pair(RegOp, AuxInfo);
}
@@ -334,12 +334,12 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
unsigned Align = OpTbl0[i][3];
assert(!RegOp2MemOpTable0.count(RegOp) && "Duplicated entries?");
RegOp2MemOpTable0[RegOp] = std::make_pair(MemOp, Align);
-
+
// If this is not a reversable operation (because there is a many->one)
// mapping, don't insert the reverse of the operation into MemOp2RegOpTable.
if (OpTbl0[i][1] & TB_NOT_REVERSABLE)
continue;
-
+
// Index 0, folded load or store.
unsigned AuxInfo = 0 | (FoldedLoad << 4) | ((FoldedLoad^1) << 5);
assert(!MemOp2RegOpTable.count(MemOp) && "Duplicated entries?");
@@ -461,12 +461,12 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
unsigned Align = OpTbl1[i][2];
assert(!RegOp2MemOpTable1.count(RegOp) && "Duplicate entries");
RegOp2MemOpTable1[RegOp] = std::make_pair(MemOp, Align);
-
+
// If this is not a reversable operation (because there is a many->one)
// mapping, don't insert the reverse of the operation into MemOp2RegOpTable.
if (OpTbl1[i][1] & TB_NOT_REVERSABLE)
continue;
-
+
// Index 1, folded load
unsigned AuxInfo = 1 | (1 << 4);
assert(!MemOp2RegOpTable.count(MemOp) && "Duplicate entries");
@@ -678,15 +678,15 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
unsigned RegOp = OpTbl2[i][0];
unsigned MemOp = OpTbl2[i][1] & ~TB_FLAGS;
unsigned Align = OpTbl2[i][2];
-
+
assert(!RegOp2MemOpTable2.count(RegOp) && "Duplicate entry!");
RegOp2MemOpTable2[RegOp] = std::make_pair(MemOp, Align);
-
+
// If this is not a reversable operation (because there is a many->one)
// mapping, don't insert the reverse of the operation into MemOp2RegOpTable.
if (OpTbl2[i][1] & TB_NOT_REVERSABLE)
continue;
-
+
// Index 2, folded load
unsigned AuxInfo = 2 | (1 << 4);
assert(!MemOp2RegOpTable.count(MemOp) &&
@@ -808,7 +808,7 @@ static bool isFrameStoreOpcode(int Opcode) {
return false;
}
-unsigned X86InstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
+unsigned X86InstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
int &FrameIndex) const {
if (isFrameLoadOpcode(MI->getOpcode()))
if (MI->getOperand(0).getSubReg() == 0 && isFrameOperand(MI, 1, FrameIndex))
@@ -816,7 +816,7 @@ unsigned X86InstrInfo::isLoadFromStackSlot(const MachineInstr *MI,
return 0;
}
-unsigned X86InstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
+unsigned X86InstrInfo::isLoadFromStackSlotPostFE(const MachineInstr *MI,
int &FrameIndex) const {
if (isFrameLoadOpcode(MI->getOpcode())) {
unsigned Reg;
@@ -946,10 +946,10 @@ X86InstrInfo::isReallyTriviallyReMaterializable(const MachineInstr *MI,
isPICBase = true;
}
return isPICBase;
- }
+ }
return false;
}
-
+
case X86::LEA32r:
case X86::LEA64r: {
if (MI->getOperand(2).isImm() &&
@@ -1124,9 +1124,9 @@ X86InstrInfo::convertToThreeAddressWithLEA(unsigned MIOpc,
MachineRegisterInfo &RegInfo = MFI->getParent()->getRegInfo();
unsigned leaInReg = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);
unsigned leaOutReg = RegInfo.createVirtualRegister(&X86::GR32RegClass);
-
+
// Build and insert into an implicit UNDEF value. This is OK because
- // well be shifting and then extracting the lower 16-bits.
+ // well be shifting and then extracting the lower 16-bits.
// This has the potential to cause partial register stall. e.g.
// movw (%rbp,%rcx,2), %dx
// leal -65(%rdx), %esi
@@ -1162,7 +1162,7 @@ X86InstrInfo::convertToThreeAddressWithLEA(unsigned MIOpc,
case X86::ADD16ri8:
case X86::ADD16ri_DB:
case X86::ADD16ri8_DB:
- addRegOffset(MIB, leaInReg, true, MI->getOperand(2).getImm());
+ addRegOffset(MIB, leaInReg, true, MI->getOperand(2).getImm());
break;
case X86::ADD16rr:
case X86::ADD16rr_DB: {
@@ -1177,7 +1177,7 @@ X86InstrInfo::convertToThreeAddressWithLEA(unsigned MIOpc,
} else {
leaInReg2 = RegInfo.createVirtualRegister(&X86::GR32_NOSPRegClass);
// Build and insert into an implicit UNDEF value. This is OK because
- // well be shifting and then extracting the lower 16-bits.
+ // well be shifting and then extracting the lower 16-bits.
BuildMI(*MFI, MIB, MI->getDebugLoc(), get(X86::IMPLICIT_DEF), leaInReg2);
InsMI2 =
BuildMI(*MFI, MIB, MI->getDebugLoc(), get(TargetOpcode::COPY))
@@ -1244,7 +1244,7 @@ X86InstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
case X86::SHUFPSrri: {
assert(MI->getNumOperands() == 4 && "Unknown shufps instruction!");
if (!TM.getSubtarget<X86Subtarget>().hasSSE2()) return 0;
-
+
unsigned B = MI->getOperand(1).getReg();
unsigned C = MI->getOperand(2).getReg();
if (B != C) return 0;
@@ -1392,7 +1392,7 @@ X86InstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
RC = X86::GR32_NOSPRegisterClass;
}
-
+
unsigned Src2 = MI->getOperand(2).getReg();
bool isKill2 = MI->getOperand(2).isKill();
@@ -1471,7 +1471,7 @@ X86InstrInfo::convertToThreeAddress(MachineFunction::iterator &MFI,
LV->replaceKillInstruction(Dest, MI, NewMI);
}
- MFI->insert(MBBI, NewMI); // Insert the new inst
+ MFI->insert(MBBI, NewMI); // Insert the new inst
return NewMI;
}
@@ -1692,7 +1692,7 @@ X86::CondCode X86::GetOppositeBranchCondition(X86::CondCode CC) {
bool X86InstrInfo::isUnpredicatedTerminator(const MachineInstr *MI) const {
const TargetInstrDesc &TID = MI->getDesc();
if (!TID.isTerminator()) return false;
-
+
// Conditional branch is a special case.
if (TID.isBranch() && !TID.isBarrier())
return true;
@@ -1701,7 +1701,7 @@ bool X86InstrInfo::isUnpredicatedTerminator(const MachineInstr *MI) const {
return !isPredicated(MI);
}
-bool X86InstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,
+bool X86InstrInfo::AnalyzeBranch(MachineBasicBlock &MBB,
MachineBasicBlock *&TBB,
MachineBasicBlock *&FBB,
SmallVectorImpl<MachineOperand> &Cond,
@@ -1862,7 +1862,7 @@ unsigned X86InstrInfo::RemoveBranch(MachineBasicBlock &MBB) const {
I = MBB.end();
++Count;
}
-
+
return Count;
}
@@ -2177,7 +2177,7 @@ static MachineInstr *FuseTwoAddrInst(MachineFunction &MF, unsigned Opcode,
MIB.addOperand(MOs[i]);
if (NumAddrOps < 4) // FrameIndex only
addOffset(MIB, 0);
-
+
// Loop over the rest of the ri operands, converting them over.
unsigned NumOps = MI->getDesc().getNumOperands()-2;
for (unsigned i = 0; i != NumOps; ++i) {
@@ -2198,7 +2198,7 @@ static MachineInstr *FuseInst(MachineFunction &MF,
MachineInstr *NewMI = MF.CreateMachineInstr(TII.get(Opcode),
MI->getDebugLoc(), true);
MachineInstrBuilder MIB(NewMI);
-
+
for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
MachineOperand &MO = MI->getOperand(i);
if (i == OpNo) {
@@ -2247,7 +2247,7 @@ X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
if (isTwoAddr && NumOps >= 2 && i < 2 &&
MI->getOperand(0).isReg() &&
MI->getOperand(1).isReg() &&
- MI->getOperand(0).getReg() == MI->getOperand(1).getReg()) {
+ MI->getOperand(0).getReg() == MI->getOperand(1).getReg()) {
OpcodeTablePtr = &RegOp2MemOpTable2Addr;
isTwoAddrFold = true;
} else if (i == 0) { // If operand 0
@@ -2261,14 +2261,14 @@ X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
NewMI = MakeM0Inst(*this, X86::MOV8mi, MOs, MI);
if (NewMI)
return NewMI;
-
+
OpcodeTablePtr = &RegOp2MemOpTable0;
} else if (i == 1) {
OpcodeTablePtr = &RegOp2MemOpTable1;
} else if (i == 2) {
OpcodeTablePtr = &RegOp2MemOpTable2;
}
-
+
// If table selected...
if (OpcodeTablePtr) {
// Find the Opcode to fuse
@@ -2316,8 +2316,8 @@ X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
return NewMI;
}
}
-
- // No fusion
+
+ // No fusion
if (PrintFailedFusing && !MI->isCopy())
dbgs() << "We failed to fuse operand " << i << " in " << *MI;
return NULL;
@@ -2328,7 +2328,7 @@ MachineInstr* X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
MachineInstr *MI,
const SmallVectorImpl<unsigned> &Ops,
int FrameIndex) const {
- // Check switch flag
+ // Check switch flag
if (NoFusing) return NULL;
if (!MF.getFunction()->hasFnAttr(Attribute::OptimizeForSize))
@@ -2380,7 +2380,7 @@ MachineInstr* X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
MachineInstr *MI,
const SmallVectorImpl<unsigned> &Ops,
MachineInstr *LoadMI) const {
- // Check switch flag
+ // Check switch flag
if (NoFusing) return NULL;
if (!MF.getFunction()->hasFnAttr(Attribute::OptimizeForSize))
@@ -2523,13 +2523,13 @@ MachineInstr* X86InstrInfo::foldMemoryOperandImpl(MachineFunction &MF,
bool X86InstrInfo::canFoldMemoryOperand(const MachineInstr *MI,
const SmallVectorImpl<unsigned> &Ops) const {
- // Check switch flag
+ // Check switch flag
if (NoFusing) return 0;
if (Ops.size() == 2 && Ops[0] == 0 && Ops[1] == 1) {
switch (MI->getOpcode()) {
default: return false;
- case X86::TEST8rr:
+ case X86::TEST8rr:
case X86::TEST16rr:
case X86::TEST32rr:
case X86::TEST64rr:
@@ -2550,7 +2550,7 @@ bool X86InstrInfo::canFoldMemoryOperand(const MachineInstr *MI,
// instruction is different than folding it other places. It requires
// replacing the *two* registers with the memory location.
const DenseMap<unsigned, std::pair<unsigned,unsigned> > *OpcodeTablePtr = 0;
- if (isTwoAddr && NumOps >= 2 && OpNum < 2) {
+ if (isTwoAddr && NumOps >= 2 && OpNum < 2) {
OpcodeTablePtr = &RegOp2MemOpTable2Addr;
} else if (OpNum == 0) { // If operand 0
switch (Opc) {
@@ -2566,7 +2566,7 @@ bool X86InstrInfo::canFoldMemoryOperand(const MachineInstr *MI,
} else if (OpNum == 2) {
OpcodeTablePtr = &RegOp2MemOpTable2;
}
-
+
if (OpcodeTablePtr && OpcodeTablePtr->count(Opc))
return true;
return TargetInstrInfoImpl::canFoldMemoryOperand(MI, Ops);
@@ -2636,7 +2636,7 @@ bool X86InstrInfo::unfoldMemoryOperand(MachineFunction &MF, MachineInstr *MI,
// Emit the data processing instruction.
MachineInstr *DataMI = MF.CreateMachineInstr(TID, MI->getDebugLoc(), true);
MachineInstrBuilder MIB(DataMI);
-
+
if (FoldedStore)
MIB.addReg(Reg, RegState::Define);
for (unsigned i = 0, e = BeforeOps.size(); i != e; ++i)
@@ -3156,11 +3156,11 @@ namespace {
PC = RegInfo.createVirtualRegister(X86::GR32RegisterClass);
else
PC = GlobalBaseReg;
-
+
// Operand of MovePCtoStack is completely ignored by asm printer. It's
// only used in JIT code emission as displacement to pc.
BuildMI(FirstMBB, MBBI, DL, TII->get(X86::MOVPC32r), PC).addImm(0);
-
+
// If we're using vanilla 'GOT' PIC style, we should use relative addressing
// not to pc, but to _GLOBAL_OFFSET_TABLE_ external.
if (TM->getSubtarget<X86Subtarget>().isPICStyleGOT()) {
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index f9c0a7b..4748f13 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -36,7 +36,7 @@ def SDTBinaryArithWithFlags : SDTypeProfile<2, 2,
SDTCisSameAs<0, 3>,
SDTCisInt<0>, SDTCisVT<1, i32>]>;
-// SDTBinaryArithWithFlagsInOut - RES1, EFLAGS = op LHS, RHS, EFLAGS
+// SDTBinaryArithWithFlagsInOut - RES1, EFLAGS = op LHS, RHS, EFLAGS
def SDTBinaryArithWithFlagsInOut : SDTypeProfile<2, 3,
[SDTCisSameAs<0, 2>,
SDTCisSameAs<0, 3>,
@@ -1612,4 +1612,3 @@ def : InstAlias<"xchgb $mem, $val", (XCHG8rm GR8 :$val, i8mem :$mem)>;
def : InstAlias<"xchgw $mem, $val", (XCHG16rm GR16:$val, i16mem:$mem)>;
def : InstAlias<"xchgl $mem, $val", (XCHG32rm GR32:$val, i32mem:$mem)>;
def : InstAlias<"xchgq $mem, $val", (XCHG64rm GR64:$val, i64mem:$mem)>;
-
diff --git a/lib/Target/X86/X86MCInstLower.cpp b/lib/Target/X86/X86MCInstLower.cpp
index cbe6db2..4159af1 100644
--- a/lib/Target/X86/X86MCInstLower.cpp
+++ b/lib/Target/X86/X86MCInstLower.cpp
@@ -46,12 +46,12 @@ GetSymbolFromOperand(const MachineOperand &MO) const {
assert((MO.isGlobal() || MO.isSymbol()) && "Isn't a symbol reference");
SmallString<128> Name;
-
+
if (!MO.isGlobal()) {
assert(MO.isSymbol());
Name += MAI.getGlobalPrefix();
Name += MO.getSymbolName();
- } else {
+ } else {
const GlobalValue *GV = MO.getGlobal();
bool isImplicitlyPrivate = false;
if (MO.getTargetFlags() == X86II::MO_DARWIN_STUB ||
@@ -59,7 +59,7 @@ GetSymbolFromOperand(const MachineOperand &MO) const {
MO.getTargetFlags() == X86II::MO_DARWIN_NONLAZY_PIC_BASE ||
MO.getTargetFlags() == X86II::MO_DARWIN_HIDDEN_NONLAZY_PIC_BASE)
isImplicitlyPrivate = true;
-
+
Mang->getNameWithPrefix(Name, GV, isImplicitlyPrivate);
}
@@ -110,7 +110,7 @@ GetSymbolFromOperand(const MachineOperand &MO) const {
getMachOMMI().getFnStubEntry(Sym);
if (StubSym.getPointer())
return Sym;
-
+
if (MO.isGlobal()) {
StubSym =
MachineModuleInfoImpl::
@@ -135,7 +135,7 @@ MCOperand X86MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
// lot of extra uniquing.
const MCExpr *Expr = 0;
MCSymbolRefExpr::VariantKind RefKind = MCSymbolRefExpr::VK_None;
-
+
switch (MO.getTargetFlags()) {
default: llvm_unreachable("Unknown target flag on GV operand");
case X86II::MO_NO_FLAG: // No flag.
@@ -144,7 +144,7 @@ MCOperand X86MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
case X86II::MO_DLLIMPORT:
case X86II::MO_DARWIN_STUB:
break;
-
+
case X86II::MO_TLVP: RefKind = MCSymbolRefExpr::VK_TLVP; break;
case X86II::MO_TLVP_PIC_BASE:
Expr = MCSymbolRefExpr::Create(Sym, MCSymbolRefExpr::VK_TLVP, Ctx);
@@ -168,7 +168,7 @@ MCOperand X86MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
case X86II::MO_DARWIN_HIDDEN_NONLAZY_PIC_BASE:
Expr = MCSymbolRefExpr::Create(Sym, Ctx);
// Subtract the pic base.
- Expr = MCBinaryExpr::CreateSub(Expr,
+ Expr = MCBinaryExpr::CreateSub(Expr,
MCSymbolRefExpr::Create(MF.getPICBaseSymbol(), Ctx),
Ctx);
if (MO.isJTI() && MAI.hasSetDirective()) {
@@ -182,10 +182,10 @@ MCOperand X86MCInstLower::LowerSymbolOperand(const MachineOperand &MO,
}
break;
}
-
+
if (Expr == 0)
Expr = MCSymbolRefExpr::Create(Sym, RefKind, Ctx);
-
+
if (!MO.isJTI() && MO.getOffset())
Expr = MCBinaryExpr::CreateAdd(Expr,
MCConstantExpr::Create(MO.getOffset(), Ctx),
@@ -206,10 +206,10 @@ static void lower_lea64_32mem(MCInst *MI, unsigned OpNo) {
// Convert registers in the addr mode according to subreg64.
for (unsigned i = 0; i != 4; ++i) {
if (!MI->getOperand(OpNo+i).isReg()) continue;
-
+
unsigned Reg = MI->getOperand(OpNo+i).getReg();
if (Reg == 0) continue;
-
+
MI->getOperand(OpNo+i).setReg(getX86SubSuperRegister(Reg, MVT::i64));
}
}
@@ -274,7 +274,7 @@ static void SimplifyShortMoveForm(X86AsmPrinter &Printer, MCInst &Inst,
return;
// Check whether this is an absolute address.
- // FIXME: We know TLVP symbol refs aren't, but there should be a better way
+ // FIXME: We know TLVP symbol refs aren't, but there should be a better way
// to do this here.
bool Absolute = true;
if (Inst.getOperand(AddrOp).isExpr()) {
@@ -283,7 +283,7 @@ static void SimplifyShortMoveForm(X86AsmPrinter &Printer, MCInst &Inst,
if (SRE->getKind() == MCSymbolRefExpr::VK_TLVP)
Absolute = false;
}
-
+
if (Absolute &&
(Inst.getOperand(AddrBase + 0).getReg() != 0 ||
Inst.getOperand(AddrBase + 2).getReg() != 0 ||
@@ -300,10 +300,10 @@ static void SimplifyShortMoveForm(X86AsmPrinter &Printer, MCInst &Inst,
void X86MCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
OutMI.setOpcode(MI->getOpcode());
-
+
for (unsigned i = 0, e = MI->getNumOperands(); i != e; ++i) {
const MachineOperand &MO = MI->getOperand(i);
-
+
MCOperand MCOp;
switch (MO.getType()) {
default:
@@ -336,10 +336,10 @@ void X86MCInstLower::Lower(const MachineInstr *MI, MCInst &OutMI) const {
AsmPrinter.GetBlockAddressSymbol(MO.getBlockAddress()));
break;
}
-
+
OutMI.addOperand(MCOp);
}
-
+
// Handle a few special cases to eliminate operand modifiers.
ReSimplify:
switch (OutMI.getOpcode()) {
@@ -429,7 +429,7 @@ ReSimplify:
case X86::TAILJMPd:
case X86::TAILJMPd64: Opcode = X86::JMP_1; break;
}
-
+
MCOperand Saved = OutMI.getOperand(0);
OutMI = MCInst();
OutMI.setOpcode(Opcode);
@@ -449,7 +449,7 @@ ReSimplify:
case X86::ADD16ri8_DB: OutMI.setOpcode(X86::OR16ri8); goto ReSimplify;
case X86::ADD32ri8_DB: OutMI.setOpcode(X86::OR32ri8); goto ReSimplify;
case X86::ADD64ri8_DB: OutMI.setOpcode(X86::OR64ri8); goto ReSimplify;
-
+
// The assembler backend wants to see branches in their small form and relax
// them to their large form. The JIT can only handle the large form because
// it does not do relaxation. For now, translate the large form to the
@@ -605,7 +605,7 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
if (OutStreamer.hasRawTextSupport())
OutStreamer.EmitRawText(StringRef("\t#MEMBARRIER"));
return;
-
+
case X86::EH_RETURN:
case X86::EH_RETURN64: {
@@ -633,7 +633,7 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
// call "L1$pb"
// "L1$pb":
// popl %esi
-
+
// Emit the call.
MCSymbol *PICBase = MF->getPICBaseSymbol();
TmpInst.setOpcode(X86::CALLpcrel32);
@@ -642,43 +642,43 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
TmpInst.addOperand(MCOperand::CreateExpr(MCSymbolRefExpr::Create(PICBase,
OutContext)));
OutStreamer.EmitInstruction(TmpInst);
-
+
// Emit the label.
OutStreamer.EmitLabel(PICBase);
-
+
// popl $reg
TmpInst.setOpcode(X86::POP32r);
TmpInst.getOperand(0) = MCOperand::CreateReg(MI->getOperand(0).getReg());
OutStreamer.EmitInstruction(TmpInst);
return;
}
-
+
case X86::ADD32ri: {
// Lower the MO_GOT_ABSOLUTE_ADDRESS form of ADD32ri.
if (MI->getOperand(2).getTargetFlags() != X86II::MO_GOT_ABSOLUTE_ADDRESS)
break;
-
+
// Okay, we have something like:
// EAX = ADD32ri EAX, MO_GOT_ABSOLUTE_ADDRESS(@MYGLOBAL)
-
+
// For this, we want to print something like:
// MYGLOBAL + (. - PICBASE)
// However, we can't generate a ".", so just emit a new label here and refer
// to it.
MCSymbol *DotSym = OutContext.CreateTempSymbol();
OutStreamer.EmitLabel(DotSym);
-
+
// Now that we have emitted the label, lower the complex operand expression.
MCSymbol *OpSym = MCInstLowering.GetSymbolFromOperand(MI->getOperand(2));
-
+
const MCExpr *DotExpr = MCSymbolRefExpr::Create(DotSym, OutContext);
const MCExpr *PICBase =
MCSymbolRefExpr::Create(MF->getPICBaseSymbol(), OutContext);
DotExpr = MCBinaryExpr::CreateSub(DotExpr, PICBase, OutContext);
-
- DotExpr = MCBinaryExpr::CreateAdd(MCSymbolRefExpr::Create(OpSym,OutContext),
+
+ DotExpr = MCBinaryExpr::CreateAdd(MCSymbolRefExpr::Create(OpSym,OutContext),
DotExpr, OutContext);
-
+
MCInst TmpInst;
TmpInst.setOpcode(X86::ADD32ri);
TmpInst.addOperand(MCOperand::CreateReg(MI->getOperand(0).getReg()));
@@ -688,9 +688,8 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
return;
}
}
-
+
MCInst TmpInst;
MCInstLowering.Lower(MI, TmpInst);
OutStreamer.EmitInstruction(TmpInst);
}
-
diff --git a/lib/Target/X86/X86RegisterInfo.cpp b/lib/Target/X86/X86RegisterInfo.cpp
index 1faf6d9..06c671b 100644
--- a/lib/Target/X86/X86RegisterInfo.cpp
+++ b/lib/Target/X86/X86RegisterInfo.cpp
@@ -445,11 +445,11 @@ bool X86RegisterInfo::needsStackRealignment(const MachineFunction &MF) const {
if (0 && requiresRealignment && MFI->hasVarSizedObjects())
report_fatal_error(
"Stack realignment in presense of dynamic allocas is not supported");
-
+
// If we've requested that we force align the stack do so now.
if (ForceStackAlign)
return canRealignStack(MF);
-
+
return requiresRealignment && canRealignStack(MF);
}
@@ -524,7 +524,7 @@ eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB,
// Factor out the amount the callee already popped.
Amount -= CalleeAmt;
-
+
if (Amount) {
unsigned Opc = getADDriOpcode(Is64Bit, Amount);
New = BuildMI(MF, DL, TII.get(Opc), StackPtr)
diff --git a/lib/Target/X86/X86RegisterInfo.td b/lib/Target/X86/X86RegisterInfo.td
index dc4c042..45bb989 100644
--- a/lib/Target/X86/X86RegisterInfo.td
+++ b/lib/Target/X86/X86RegisterInfo.td
@@ -1,10 +1,10 @@
//===- X86RegisterInfo.td - Describe the X86 Register File --*- tablegen -*-==//
-//
+//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
-//
+//
//===----------------------------------------------------------------------===//
//
// This file describes the X86 Register file, defining the registers themselves,
@@ -34,8 +34,8 @@ let Namespace = "X86" in {
// because the register file generator is smart enough to figure out that
// AL aliases AX if we tell it that AX aliased AL (for example).
- // Dwarf numbering is different for 32-bit and 64-bit, and there are
- // variations by target as well. Currently the first entry is for X86-64,
+ // Dwarf numbering is different for 32-bit and 64-bit, and there are
+ // variations by target as well. Currently the first entry is for X86-64,
// second - for EH on X86-32/Darwin and third is 'generic' one (X86-32/Linux
// and debug information on X86-32/Darwin)
@@ -81,7 +81,7 @@ let Namespace = "X86" in {
def SP : RegisterWithSubRegs<"sp", [SPL]>, DwarfRegNum<[7, 5, 4]>;
}
def IP : Register<"ip">, DwarfRegNum<[16]>;
-
+
// X86-64 only
let SubRegIndices = [sub_8bit] in {
def R8W : RegisterWithSubRegs<"r8w", [R8B]>, DwarfRegNum<[8, -2, -2]>;
@@ -103,8 +103,8 @@ let Namespace = "X86" in {
def EDI : RegisterWithSubRegs<"edi", [DI]>, DwarfRegNum<[5, 7, 7]>;
def EBP : RegisterWithSubRegs<"ebp", [BP]>, DwarfRegNum<[6, 4, 5]>;
def ESP : RegisterWithSubRegs<"esp", [SP]>, DwarfRegNum<[7, 5, 4]>;
- def EIP : RegisterWithSubRegs<"eip", [IP]>, DwarfRegNum<[16, 8, 8]>;
-
+ def EIP : RegisterWithSubRegs<"eip", [IP]>, DwarfRegNum<[16, 8, 8]>;
+
// X86-64 only
def R8D : RegisterWithSubRegs<"r8d", [R8W]>, DwarfRegNum<[8, -2, -2]>;
def R9D : RegisterWithSubRegs<"r9d", [R9W]>, DwarfRegNum<[9, -2, -2]>;
@@ -208,7 +208,7 @@ let Namespace = "X86" in {
def ST4 : Register<"st(4)">, DwarfRegNum<[37, 16, 15]>;
def ST5 : Register<"st(5)">, DwarfRegNum<[38, 17, 16]>;
def ST6 : Register<"st(6)">, DwarfRegNum<[39, 18, 17]>;
- def ST7 : Register<"st(7)">, DwarfRegNum<[40, 19, 18]>;
+ def ST7 : Register<"st(7)">, DwarfRegNum<[40, 19, 18]>;
// Status flags register
def EFLAGS : Register<"flags">;
@@ -220,7 +220,7 @@ let Namespace = "X86" in {
def ES : Register<"es">;
def FS : Register<"fs">;
def GS : Register<"gs">;
-
+
// Debug registers
def DR0 : Register<"dr0">;
def DR1 : Register<"dr1">;
@@ -230,7 +230,7 @@ let Namespace = "X86" in {
def DR5 : Register<"dr5">;
def DR6 : Register<"dr6">;
def DR7 : Register<"dr7">;
-
+
// Control registers
def CR0 : Register<"cr0">;
def CR1 : Register<"cr1">;
@@ -261,10 +261,10 @@ let Namespace = "X86" in {
// implicitly defined to be the register allocation order.
//
-// List call-clobbered registers before callee-save registers. RBX, RBP, (and
+// List call-clobbered registers before callee-save registers. RBX, RBP, (and
// R12, R13, R14, and R15 for X86-64) are callee-save registers.
// In 64-mode, there are 12 additional i8 registers, SIL, DIL, BPL, SPL, and
-// R8B, ... R15B.
+// R8B, ... R15B.
// Allocate R12 and R13 last, as these require an extra byte when
// encoded in x86_64 instructions.
// FIXME: Allow AH, CH, DH, BH to be used as general-purpose registers in
--
1.7.1.GIT
-------------- next part --------------
From ed62d827ef015cb8abdb58ae4606ae6b03bff481 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Fri, 7 Jan 2011 10:58:30 +0900
Subject: [PATCH 2/9] test/CodeGen/X86: Fix whitespace.
---
test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll | 5 ++---
test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll | 5 ++---
test/CodeGen/X86/tailcallstack64.ll | 1 -
test/CodeGen/X86/win64_vararg.ll | 10 +++++-----
4 files changed, 9 insertions(+), 12 deletions(-)
diff --git a/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll b/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
index c598228..c5d3ac1 100644
--- a/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
+++ b/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
@@ -3,7 +3,6 @@ target triple = "x86_64-pc-mingw64"
define x86_fp80 @a(i64 %x) nounwind readnone {
entry:
- %conv = sitofp i64 %x to x86_fp80 ; <x86_fp80> [#uses=1]
- ret x86_fp80 %conv
+ %conv = sitofp i64 %x to x86_fp80 ; <x86_fp80> [#uses=1]
+ ret x86_fp80 %conv
}
-
diff --git a/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll b/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
index 810a6f4..b722589 100644
--- a/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
+++ b/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
@@ -6,7 +6,6 @@ target triple = "x86_64-pc-mingw64"
define i32 @a() nounwind {
entry:
- tail call void asm sideeffect "", "~{xmm7},~{xmm8},~{dirflag},~{fpsr},~{flags}"() nounwind
- ret i32 undef
+ tail call void asm sideeffect "", "~{xmm7},~{xmm8},~{dirflag},~{fpsr},~{flags}"() nounwind
+ ret i32 undef
}
-
diff --git a/test/CodeGen/X86/tailcallstack64.ll b/test/CodeGen/X86/tailcallstack64.ll
index 107bdf9..52b074d 100644
--- a/test/CodeGen/X86/tailcallstack64.ll
+++ b/test/CodeGen/X86/tailcallstack64.ll
@@ -22,4 +22,3 @@ entry:
%retval = tail call fastcc i32 @tailcallee(i32 %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5, i32 %p6, i32 %in2,i32 %tmp)
ret i32 %retval
}
-
diff --git a/test/CodeGen/X86/win64_vararg.ll b/test/CodeGen/X86/win64_vararg.ll
index 072f36a..71b2fa1 100644
--- a/test/CodeGen/X86/win64_vararg.ll
+++ b/test/CodeGen/X86/win64_vararg.ll
@@ -5,11 +5,11 @@
; calculated.
define void @average_va(i32 %count, ...) nounwind {
entry:
-; CHECK: subq $40, %rsp
-; CHECK: movq %r9, 72(%rsp)
-; CHECK: movq %r8, 64(%rsp)
-; CHECK: movq %rdx, 56(%rsp)
-; CHECK: leaq 56(%rsp), %rax
+; CHECK: subq $40, %rsp
+; CHECK: movq %r9, 72(%rsp)
+; CHECK: movq %r8, 64(%rsp)
+; CHECK: movq %rdx, 56(%rsp)
+; CHECK: leaq 56(%rsp), %rax
%ap = alloca i8*, align 8 ; <i8**> [#uses=1]
%ap1 = bitcast i8** %ap to i8* ; <i8*> [#uses=1]
--
1.7.1.GIT
-------------- next part --------------
From 1bca7c8f8b750651827b9e045c93ff66afed4bdf Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Mon, 10 Jan 2011 13:19:04 +0900
Subject: [PATCH 3/9] lib/Target/X86/X86ISelLowering.cpp: Introduce a new variable "IsWin64". No functional changes.
---
lib/Target/X86/X86ISelLowering.cpp | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index 2268823..3f1bed1 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -1863,6 +1863,7 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
SmallVectorImpl<SDValue> &InVals) const {
MachineFunction &MF = DAG.getMachineFunction();
bool Is64Bit = Subtarget->is64Bit();
+ bool IsWin64 = Subtarget->isTargetWin64();
bool IsStructRet = CallIsStructReturn(Outs);
bool IsSibcall = false;
@@ -1970,7 +1971,7 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
if (VA.isRegLoc()) {
RegsToPass.push_back(std::make_pair(VA.getLocReg(), Arg));
- if (isVarArg && Subtarget->isTargetWin64()) {
+ if (isVarArg && IsWin64) {
// Win64 ABI requires argument XMM reg to be copied to the corresponding
// shadow reg if callee is a varargs function.
unsigned ShadowReg = 0;
@@ -2036,7 +2037,7 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
}
}
- if (Is64Bit && isVarArg && !Subtarget->isTargetWin64()) {
+ if (Is64Bit && isVarArg && !IsWin64) {
// From AMD64 ABI document:
// For calls that may call functions that use varargs or stdargs
// (prototype-less calls or calls to functions containing ellipsis (...) in
@@ -2211,7 +2212,7 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
Ops.push_back(DAG.getRegister(X86::EBX, getPointerTy()));
// Add an implicit use of AL for non-Windows x86 64-bit vararg functions.
- if (Is64Bit && isVarArg && !Subtarget->isTargetWin64())
+ if (Is64Bit && isVarArg && !IsWin64)
Ops.push_back(DAG.getRegister(X86::AL, MVT::i8));
if (InFlag.getNode())
--
1.7.1.GIT
-------------- next part --------------
From 674c32b4785d1bbc146b032735bdc021475cfef0 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Mon, 13 Dec 2010 17:59:20 +0900
Subject: [PATCH 4/9] Target/X86: Tweak allocating shadow area (aka home) on Win64. It must be enough for caller to allocate one.
---
lib/Target/X86/X86FrameLowering.cpp | 5 ----
lib/Target/X86/X86FrameLowering.h | 3 +-
lib/Target/X86/X86ISelLowering.cpp | 21 ++++++++++++++++++-
test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll | 4 +-
test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll | 9 +++----
test/CodeGen/X86/win64_params.ll | 4 +-
test/CodeGen/X86/win64_vararg.ll | 10 ++++----
7 files changed, 33 insertions(+), 23 deletions(-)
diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index cbf1b59..71fd8d1 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -400,11 +400,6 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF) const {
if (HasFP) MinSize += SlotSize;
StackSize = std::max(MinSize, StackSize > 128 ? StackSize - 128 : 0);
MFI->setStackSize(StackSize);
- } else if (IsWin64) {
- // We need to always allocate 32 bytes as register spill area.
- // FIXME: We might reuse these 32 bytes for leaf functions.
- StackSize += 32;
- MFI->setStackSize(StackSize);
}
// Insert stack pointer adjustment for later moving of return addr. Only
diff --git a/lib/Target/X86/X86FrameLowering.h b/lib/Target/X86/X86FrameLowering.h
index c067e64..d71108c 100644
--- a/lib/Target/X86/X86FrameLowering.h
+++ b/lib/Target/X86/X86FrameLowering.h
@@ -28,8 +28,7 @@ public:
explicit X86FrameLowering(const X86TargetMachine &tm, const X86Subtarget &sti)
: TargetFrameLowering(StackGrowsDown,
sti.getStackAlignment(),
- (sti.isTargetWin64() ? -40 :
- (sti.is64Bit() ? -8 : -4))),
+ (sti.is64Bit() ? -8 : -4)),
TM(tm), STI(sti) {
}
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index 3f1bed1..d3213de 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -1569,6 +1569,12 @@ X86TargetLowering::LowerFormalArguments(SDValue Chain,
SmallVector<CCValAssign, 16> ArgLocs;
CCState CCInfo(CallConv, isVarArg, getTargetMachine(),
ArgLocs, *DAG.getContext());
+
+ // Allocate shadow area for Win64
+ if (IsWin64) {
+ CCInfo.AllocateStack(32, 8);
+ }
+
CCInfo.AnalyzeFormalArguments(Ins, CC_X86);
unsigned LastVal = ~0U;
@@ -1803,8 +1809,7 @@ X86TargetLowering::LowerMemOpCallTo(SDValue Chain,
DebugLoc dl, SelectionDAG &DAG,
const CCValAssign &VA,
ISD::ArgFlagsTy Flags) const {
- const unsigned FirstStackArgOffset = (Subtarget->isTargetWin64() ? 32 : 0);
- unsigned LocMemOffset = FirstStackArgOffset + VA.getLocMemOffset();
+ unsigned LocMemOffset = VA.getLocMemOffset();
SDValue PtrOff = DAG.getIntPtrConstant(LocMemOffset);
PtrOff = DAG.getNode(ISD::ADD, dl, getPointerTy(), StackPtr, PtrOff);
if (Flags.isByVal())
@@ -1889,6 +1894,12 @@ X86TargetLowering::LowerCall(SDValue Chain, SDValue Callee,
SmallVector<CCValAssign, 16> ArgLocs;
CCState CCInfo(CallConv, isVarArg, getTargetMachine(),
ArgLocs, *DAG.getContext());
+
+ // Allocate shadow area for Win64
+ if (IsWin64) {
+ CCInfo.AllocateStack(32, 8);
+ }
+
CCInfo.AnalyzeCallOperands(Outs, CC_X86);
// Get a count of how many bytes are to be pushed on the stack.
@@ -2472,6 +2483,12 @@ X86TargetLowering::IsEligibleForTailCallOptimization(SDValue Callee,
SmallVector<CCValAssign, 16> ArgLocs;
CCState CCInfo(CalleeCC, isVarArg, getTargetMachine(),
ArgLocs, *DAG.getContext());
+
+ // Allocate shadow area for Win64
+ if (Subtarget->isTargetWin64()) {
+ CCInfo.AllocateStack(32, 8);
+ }
+
CCInfo.AnalyzeCallOperands(Outs, CC_X86);
if (CCInfo.getNextStackOffset()) {
MachineFunction &MF = DAG.getMachineFunction();
diff --git a/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll b/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
index c5d3ac1..9d06a9e 100644
--- a/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
+++ b/test/CodeGen/X86/2009-06-03-Win64DisableRedZone.ll
@@ -1,5 +1,5 @@
-; RUN: llc < %s | grep "subq.*\\\$40, \\\%rsp"
-target triple = "x86_64-pc-mingw64"
+; RUN: llc -mtriple=x86_64-pc-mingw64 < %s | FileCheck %s
+; CHECK-NOT: -{{[1-9][0-9]*}}(%rsp)
define x86_fp80 @a(i64 %x) nounwind readnone {
entry:
diff --git a/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll b/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
index b722589..6e8d9a9 100644
--- a/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
+++ b/test/CodeGen/X86/2009-06-03-Win64SpillXMM.ll
@@ -1,8 +1,7 @@
-; RUN: llc < %s -o %t1
-; RUN: grep "subq.*\\\$72, \\\%rsp" %t1
-; RUN: grep "movaps \\\%xmm8, 32\\\(\\\%rsp\\\)" %t1
-; RUN: grep "movaps \\\%xmm7, 48\\\(\\\%rsp\\\)" %t1
-target triple = "x86_64-pc-mingw64"
+; RUN: llc -mtriple=x86_64-pc-mingw64 < %s | FileCheck %s
+; CHECK: subq $40, %rsp
+; CHECK: movaps %xmm8, (%rsp)
+; CHECK: movaps %xmm7, 16(%rsp)
define i32 @a() nounwind {
entry:
diff --git a/test/CodeGen/X86/win64_params.ll b/test/CodeGen/X86/win64_params.ll
index 0b67368..f9d4bf9 100644
--- a/test/CodeGen/X86/win64_params.ll
+++ b/test/CodeGen/X86/win64_params.ll
@@ -4,8 +4,8 @@
; on the stack.
define i32 @f6(i32 %p1, i32 %p2, i32 %p3, i32 %p4, i32 %p5, i32 %p6) nounwind readnone optsize {
entry:
-; CHECK: movl 80(%rsp), %eax
-; CHECK: addl 72(%rsp), %eax
+; CHECK: movl 48(%rsp), %eax
+; CHECK: addl 40(%rsp), %eax
%add = add nsw i32 %p6, %p5
ret i32 %add
}
diff --git a/test/CodeGen/X86/win64_vararg.ll b/test/CodeGen/X86/win64_vararg.ll
index 71b2fa1..a451318 100644
--- a/test/CodeGen/X86/win64_vararg.ll
+++ b/test/CodeGen/X86/win64_vararg.ll
@@ -5,11 +5,11 @@
; calculated.
define void @average_va(i32 %count, ...) nounwind {
entry:
-; CHECK: subq $40, %rsp
-; CHECK: movq %r9, 72(%rsp)
-; CHECK: movq %r8, 64(%rsp)
-; CHECK: movq %rdx, 56(%rsp)
-; CHECK: leaq 56(%rsp), %rax
+; CHECK: pushq
+; CHECK: movq %r9, 40(%rsp)
+; CHECK: movq %r8, 32(%rsp)
+; CHECK: movq %rdx, 24(%rsp)
+; CHECK: leaq 24(%rsp), %rax
%ap = alloca i8*, align 8 ; <i8**> [#uses=1]
%ap1 = bitcast i8** %ap to i8* ; <i8*> [#uses=1]
--
1.7.1.GIT
-------------- next part --------------
From 6d3c586594c36faa5244c982c7068dac18931a63 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Mon, 13 Dec 2010 18:11:31 +0900
Subject: [PATCH 5/9] Target/X86: Tweak alloca and add a testcase for mingw64 and msvcrt on Win64. [PR8778]
---
lib/Target/X86/X86FrameLowering.cpp | 19 +++++++--
lib/Target/X86/X86ISelLowering.cpp | 57 +++++++++++++++++++--------
lib/Target/X86/X86InstrControl.td | 10 +++++
test/CodeGen/X86/win64_alloca_dynalloca.ll | 56 +++++++++++++++++++++++++++
test/CodeGen/X86/win_chkstk.ll | 2 +-
5 files changed, 121 insertions(+), 23 deletions(-)
create mode 100644 test/CodeGen/X86/win64_alloca_dynalloca.ll
diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index 71fd8d1..ce88169 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -555,14 +555,23 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF) const {
// responsible for adjusting the stack pointer. Touching the stack at 4K
// increments is necessary to ensure that the guard pages used by the OS
// virtual memory manager are allocated in correct sequence.
- if (NumBytes >= 4096 && (STI.isTargetCygMing() || STI.isTargetWin32())) {
+ if (NumBytes >= 4096 && Is64Bit && STI.isTargetCygMing()) {
+ // Sanity check that EAX is not livein for this function. It should
+ // should not be, so throw an assert.
+ assert(!isEAXLiveIn(MF) && "EAX is livein in the Cygming64 case!");
+
+ BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::RAX)
+ .addImm(NumBytes);
+ BuildMI(MBB, MBBI, DL, TII.get(X86::W64ALLOCA))
+ .addExternalSymbol("___chkstk")
+ .addReg(StackPtr, RegState::Define | RegState::Implicit);
+ // Cygming's ___chkstk adjusts %rsp.
+ } else if (NumBytes >= 4096 && (STI.isTargetCygMing() || STI.isTargetWin32())) {
// Check whether EAX is livein for this function.
bool isEAXAlive = isEAXLiveIn(MF);
const char *StackProbeSymbol =
STI.isTargetWindows() ? "_chkstk" : "_alloca";
- if (Is64Bit && STI.isTargetCygMing())
- StackProbeSymbol = "__chkstk";
unsigned CallOp = Is64Bit ? X86::CALL64pcrel32 : X86::CALLpcrel32;
if (!isEAXAlive) {
BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), X86::EAX)
@@ -598,9 +607,9 @@ void X86FrameLowering::emitPrologue(MachineFunction &MF) const {
// Handle the 64-bit Windows ABI case where we need to call __chkstk.
// Function prologue is responsible for adjusting the stack pointer.
- BuildMI(MBB, MBBI, DL, TII.get(X86::MOV32ri), X86::EAX)
+ BuildMI(MBB, MBBI, DL, TII.get(X86::MOV64ri), X86::RAX)
.addImm(NumBytes);
- BuildMI(MBB, MBBI, DL, TII.get(X86::WINCALL64pcrel32))
+ BuildMI(MBB, MBBI, DL, TII.get(X86::W64ALLOCA))
.addExternalSymbol("__chkstk")
.addReg(StackPtr, RegState::Define | RegState::Implicit);
emitSPUpdate(MBB, MBBI, StackPtr, -(int64_t)NumBytes, Is64Bit,
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index d3213de..a012ae2 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -418,12 +418,9 @@ X86TargetLowering::X86TargetLowering(X86TargetMachine &TM)
setOperationAction(ISD::STACKSAVE, MVT::Other, Expand);
setOperationAction(ISD::STACKRESTORE, MVT::Other, Expand);
- if (Subtarget->is64Bit())
- setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i64, Expand);
- if (Subtarget->isTargetCygMing() || Subtarget->isTargetWindows())
- setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i32, Custom);
- else
- setOperationAction(ISD::DYNAMIC_STACKALLOC, MVT::i32, Expand);
+ setOperationAction(ISD::DYNAMIC_STACKALLOC,
+ (Subtarget->is64Bit() ? MVT::i64 : MVT::i32),
+ (Subtarget->isTargetCOFF() ? Custom : Expand));
if (!UseSoftFloat && X86ScalarSSEf64) {
// f32 and f64 use SSE.
@@ -7553,8 +7550,9 @@ X86TargetLowering::LowerDYNAMIC_STACKALLOC(SDValue Op,
SDValue Flag;
EVT SPTy = Subtarget->is64Bit() ? MVT::i64 : MVT::i32;
+ unsigned Reg = (Subtarget->is64Bit() ? X86::RAX : X86::EAX);
- Chain = DAG.getCopyToReg(Chain, dl, X86::EAX, Size, Flag);
+ Chain = DAG.getCopyToReg(Chain, dl, Reg, Size, Flag);
Flag = Chain.getValue(1);
SDVTList NodeTys = DAG.getVTList(MVT::Other, MVT::Glue);
@@ -10022,19 +10020,44 @@ X86TargetLowering::EmitLoweredWinAlloca(MachineInstr *MI,
// The lowering is pretty easy: we're just emitting the call to _alloca. The
// non-trivial part is impdef of ESP.
- // FIXME: The code should be tweaked as soon as we'll try to do codegen for
- // mingw-w64.
- const char *StackProbeSymbol =
+ if (Subtarget->isTargetWin64()) {
+ if (Subtarget->isTargetCygMing()) {
+ // ___chkstk(Mingw64):
+ // Clobbers R10, R11, RAX and EFLAGS.
+ // Updates RSP.
+ BuildMI(*BB, MI, DL, TII->get(X86::W64ALLOCA))
+ .addExternalSymbol("___chkstk")
+ .addReg(X86::RAX, RegState::Implicit)
+ .addReg(X86::RSP, RegState::Implicit)
+ .addReg(X86::RAX, RegState::Define | RegState::Implicit)
+ .addReg(X86::RSP, RegState::Define | RegState::Implicit)
+ .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
+ } else {
+ // __chkstk(MSVCRT): does not update stack pointer.
+ // Clobbers R10, R11 and EFLAGS.
+ // FIXME: RAX(allocated size) might be reused and not killed.
+ BuildMI(*BB, MI, DL, TII->get(X86::W64ALLOCA))
+ .addExternalSymbol("__chkstk")
+ .addReg(X86::RAX, RegState::Implicit)
+ .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
+ // RAX has the offset to subtracted from RSP.
+ BuildMI(*BB, MI, DL, TII->get(X86::SUB64rr), X86::RSP)
+ .addReg(X86::RSP)
+ .addReg(X86::RAX);
+ }
+ } else {
+ const char *StackProbeSymbol =
Subtarget->isTargetWindows() ? "_chkstk" : "_alloca";
- BuildMI(*BB, MI, DL, TII->get(X86::CALLpcrel32))
- .addExternalSymbol(StackProbeSymbol)
- .addReg(X86::EAX, RegState::Implicit)
- .addReg(X86::ESP, RegState::Implicit)
- .addReg(X86::EAX, RegState::Define | RegState::Implicit)
- .addReg(X86::ESP, RegState::Define | RegState::Implicit)
- .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
+ BuildMI(*BB, MI, DL, TII->get(X86::CALLpcrel32))
+ .addExternalSymbol(StackProbeSymbol)
+ .addReg(X86::EAX, RegState::Implicit)
+ .addReg(X86::ESP, RegState::Implicit)
+ .addReg(X86::EAX, RegState::Define | RegState::Implicit)
+ .addReg(X86::ESP, RegState::Define | RegState::Implicit)
+ .addReg(X86::EFLAGS, RegState::Define | RegState::Implicit);
+ }
MI->eraseFromParent(); // The pseudo instruction is gone now.
return BB;
diff --git a/lib/Target/X86/X86InstrControl.td b/lib/Target/X86/X86InstrControl.td
index 4d1c5f7..31f2832 100644
--- a/lib/Target/X86/X86InstrControl.td
+++ b/lib/Target/X86/X86InstrControl.td
@@ -263,6 +263,16 @@ let isCall = 1, isCodeGenOnly = 1 in
Requires<[IsWin64]>;
}
+let isCall = 1, isCodeGenOnly = 1 in
+ // __chkstk(MSVC): clobber R10, R11 and EFLAGS.
+ // ___chkstk(Mingw64): clobber R10, R11, RAX and EFLAGS, and update RSP.
+ let Defs = [RAX, R10, R11, RSP, EFLAGS],
+ Uses = [RSP] in {
+ def W64ALLOCA : Ii32PCRel<0xE8, RawFrm,
+ (outs), (ins i64i32imm_pcrel:$dst, variable_ops),
+ "call{q}\t$dst", []>,
+ Requires<[IsWin64]>;
+ }
let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
isCodeGenOnly = 1 in
diff --git a/test/CodeGen/X86/win64_alloca_dynalloca.ll b/test/CodeGen/X86/win64_alloca_dynalloca.ll
new file mode 100644
index 0000000..bb43608
--- /dev/null
+++ b/test/CodeGen/X86/win64_alloca_dynalloca.ll
@@ -0,0 +1,56 @@
+; RUN: llc < %s -mtriple=x86_64-mingw32 | FileCheck %s -check-prefix=M64
+; RUN: llc < %s -mtriple=x86_64-mingw64 | FileCheck %s -check-prefix=M64
+; RUN: llc < %s -mtriple=x86_64-win32 | FileCheck %s -check-prefix=W64
+; PR8777
+; PR8778
+
+define i64 @foo(i64 %n, i64 %x) nounwind {
+entry:
+
+ %buf0 = alloca i8, i64 4096, align 1
+
+; M64: movq %rsp, %rbp
+; M64: $4096, %rax
+; M64: callq ___chkstk
+; M64-NOT: %rsp
+
+; W64: movq %rsp, %rbp
+; W64: $4096, %rax
+; W64: callq __chkstk
+; W64: subq $4096, %rsp
+
+ %buf1 = alloca i8, i64 %n, align 1
+
+; M64: leaq 15(%rcx), %rax
+; M64: andq $-16, %rax
+; M64: callq ___chkstk
+; M64-NOT: %rsp
+; M64: movq %rsp, %rax
+
+; W64: leaq 15(%rcx), %rax
+; W64: andq $-16, %rax
+; W64: callq __chkstk
+; W64: subq %rax, %rsp
+; W64: movq %rsp, %rax
+
+ %r = call i64 @bar(i64 %n, i64 %x, i64 %n, i8* %buf0, i8* %buf1) nounwind
+
+; M64: subq $48, %rsp
+; M64: movq %rax, 32(%rsp)
+; M64: leaq -4096(%rbp), %r9
+; M64: callq bar
+
+; W64: subq $48, %rsp
+; W64: movq %rax, 32(%rsp)
+; W64: leaq -4096(%rbp), %r9
+; W64: callq bar
+
+ ret i64 %r
+
+; M64: movq %rbp, %rsp
+
+; W64: movq %rbp, %rsp
+
+}
+
+declare i64 @bar(i64, i64, i64, i8* nocapture, i8* nocapture) nounwind
diff --git a/test/CodeGen/X86/win_chkstk.ll b/test/CodeGen/X86/win_chkstk.ll
index 82ce81d..ae7591d 100644
--- a/test/CodeGen/X86/win_chkstk.ll
+++ b/test/CodeGen/X86/win_chkstk.ll
@@ -16,7 +16,7 @@ entry:
; WIN_X32: calll __chkstk
; WIN_X64: callq __chkstk
; MINGW_X32: calll __alloca
-; MINGW_X64: callq __chkstk
+; MINGW_X64: callq ___chkstk
; LINUX-NOT: call __chkstk
%array4096 = alloca [4096 x i8], align 16 ; <[4096 x i8]*> [#uses=0]
ret i32 0
--
1.7.1.GIT
-------------- next part --------------
From 7d88296c8b870f9685f833a882a68bbf2fdad864 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Wed, 15 Dec 2010 13:07:12 +0900
Subject: [PATCH 6/9] Target/X86: Tweak va_arg for Win64 not to miss taking va_start when number of fixed args > 4.
---
lib/Target/X86/X86ISelLowering.cpp | 8 +++++---
test/CodeGen/X86/win64_vararg.ll | 33 +++++++++++++++++++++++++++++++++
2 files changed, 38 insertions(+), 3 deletions(-)
diff --git a/lib/Target/X86/X86ISelLowering.cpp b/lib/Target/X86/X86ISelLowering.cpp
index a012ae2..588c3bb 100644
--- a/lib/Target/X86/X86ISelLowering.cpp
+++ b/lib/Target/X86/X86ISelLowering.cpp
@@ -1662,8 +1662,8 @@ X86TargetLowering::LowerFormalArguments(SDValue Chain,
// If the function takes variable number of arguments, make a frame index for
// the start of the first vararg value... for expansion of llvm.va_start.
if (isVarArg) {
- if (!IsWin64 && (Is64Bit || (CallConv != CallingConv::X86_FastCall &&
- CallConv != CallingConv::X86_ThisCall))) {
+ if (Is64Bit || (CallConv != CallingConv::X86_FastCall &&
+ CallConv != CallingConv::X86_ThisCall)) {
FuncInfo->setVarArgsFrameIndex(MFI->CreateFixedObject(1, StackSize,true));
}
if (Is64Bit) {
@@ -1715,7 +1715,9 @@ X86TargetLowering::LowerFormalArguments(SDValue Chain,
int HomeOffset = TFI.getOffsetOfLocalArea() + 8;
FuncInfo->setRegSaveFrameIndex(
MFI->CreateFixedObject(1, NumIntRegs * 8 + HomeOffset, false));
- FuncInfo->setVarArgsFrameIndex(FuncInfo->getRegSaveFrameIndex());
+ // FIXME: It is dirty hack but works.
+ if (NumIntRegs < 4)
+ FuncInfo->setVarArgsFrameIndex(FuncInfo->getRegSaveFrameIndex());
} else {
// For X86-64, if there are vararg parameters that are passed via
// registers, then we must store them to their spots on the stack so they
diff --git a/test/CodeGen/X86/win64_vararg.ll b/test/CodeGen/X86/win64_vararg.ll
index a451318..efe8bca 100644
--- a/test/CodeGen/X86/win64_vararg.ll
+++ b/test/CodeGen/X86/win64_vararg.ll
@@ -18,3 +18,36 @@ entry:
}
declare void @llvm.va_start(i8*) nounwind
+
+; CHECK: f5:
+; CHECK: pushq
+; CHECK: leaq 56(%rsp),
+define i8* @f5(i64 %a0, i64 %a1, i64 %a2, i64 %a3, i64 %a4, ...) nounwind {
+entry:
+ %ap = alloca i8*, align 8
+ %ap1 = bitcast i8** %ap to i8*
+ call void @llvm.va_start(i8* %ap1)
+ ret i8* %ap1
+}
+
+; CHECK: f4:
+; CHECK: pushq
+; CHECK: leaq 48(%rsp),
+define i8* @f4(i64 %a0, i64 %a1, i64 %a2, i64 %a3, ...) nounwind {
+entry:
+ %ap = alloca i8*, align 8
+ %ap1 = bitcast i8** %ap to i8*
+ call void @llvm.va_start(i8* %ap1)
+ ret i8* %ap1
+}
+
+; CHECK: f3:
+; CHECK: pushq
+; CHECK: leaq 40(%rsp),
+define i8* @f3(i64 %a0, i64 %a1, i64 %a2, ...) nounwind {
+entry:
+ %ap = alloca i8*, align 8
+ %ap1 = bitcast i8** %ap to i8*
+ call void @llvm.va_start(i8* %ap1)
+ ret i8* %ap1
+}
--
1.7.1.GIT
-------------- next part --------------
From f1b50f241f8346d7888caacd50c9767f5afe6d4a Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Tue, 28 Dec 2010 19:05:25 +0900
Subject: [PATCH 7/9] X86FrameInfo.cpp, X86RegisterInfo.cpp: Re-indent. No functional changes.
---
lib/Target/X86/X86FrameLowering.cpp | 35 ++++++++++++++++++++---------------
lib/Target/X86/X86RegisterInfo.cpp | 3 ++-
2 files changed, 22 insertions(+), 16 deletions(-)
diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index ce88169..d636105 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -759,6 +759,12 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
}
// We're returning from function via eh_return.
+ bool isRel = (RetOpcode == X86::TCRETURNdi ||
+ RetOpcode == X86::TCRETURNdi64);
+ bool isMem = (RetOpcode == X86::TCRETURNmi ||
+ RetOpcode == X86::TCRETURNmi64);
+ bool isReg = (RetOpcode == X86::TCRETURNri ||
+ RetOpcode == X86::TCRETURNri64);
if (RetOpcode == X86::EH_RETURN || RetOpcode == X86::EH_RETURN64) {
MBBI = prior(MBB.end());
MachineOperand &DestAddr = MBBI->getOperand(0);
@@ -766,11 +772,7 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
BuildMI(MBB, MBBI, DL,
TII.get(Is64Bit ? X86::MOV64rr : X86::MOV32rr),
StackPtr).addReg(DestAddr.getReg());
- } else if (RetOpcode == X86::TCRETURNri || RetOpcode == X86::TCRETURNdi ||
- RetOpcode == X86::TCRETURNmi ||
- RetOpcode == X86::TCRETURNri64 || RetOpcode == X86::TCRETURNdi64 ||
- RetOpcode == X86::TCRETURNmi64) {
- bool isMem = RetOpcode == X86::TCRETURNmi || RetOpcode == X86::TCRETURNmi64;
+ } else if (isReg || isRel || isMem) {
// Tail call return: adjust the stack pointer and jump to callee.
MBBI = prior(MBB.end());
MachineOperand &JumpTarget = MBBI->getOperand(0);
@@ -794,10 +796,11 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
}
// Jump to label or value in register.
- if (RetOpcode == X86::TCRETURNdi || RetOpcode == X86::TCRETURNdi64) {
+ if (isRel) {
MachineInstrBuilder MIB =
- BuildMI(MBB, MBBI, DL, TII.get((RetOpcode == X86::TCRETURNdi)
- ? X86::TAILJMPd : X86::TAILJMPd64));
+ BuildMI(MBB, MBBI, DL,
+ TII.get(RetOpcode == X86::TCRETURNdi ? X86::TAILJMPd
+ : X86::TAILJMPd64));
if (JumpTarget.isGlobal())
MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(),
JumpTarget.getTargetFlags());
@@ -806,18 +809,20 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
MIB.addExternalSymbol(JumpTarget.getSymbolName(),
JumpTarget.getTargetFlags());
}
- } else if (RetOpcode == X86::TCRETURNmi || RetOpcode == X86::TCRETURNmi64) {
+ } else if (isMem) {
MachineInstrBuilder MIB =
- BuildMI(MBB, MBBI, DL, TII.get((RetOpcode == X86::TCRETURNmi)
- ? X86::TAILJMPm : X86::TAILJMPm64));
+ BuildMI(MBB, MBBI, DL,
+ TII.get(RetOpcode == X86::TCRETURNmi ? X86::TAILJMPm
+ : X86::TAILJMPm64));
for (unsigned i = 0; i != 5; ++i)
MIB.addOperand(MBBI->getOperand(i));
- } else if (RetOpcode == X86::TCRETURNri64) {
- BuildMI(MBB, MBBI, DL, TII.get(X86::TAILJMPr64)).
+ } else if (isReg) {
+ BuildMI(MBB, MBBI, DL,
+ TII.get(RetOpcode == X86::TCRETURNri64 ? X86::TAILJMPr64
+ : X86::TAILJMPr)).
addReg(JumpTarget.getReg(), RegState::Kill);
} else {
- BuildMI(MBB, MBBI, DL, TII.get(X86::TAILJMPr)).
- addReg(JumpTarget.getReg(), RegState::Kill);
+ llvm_unreachable("What could I select for TCRETURN?");
}
MachineInstr *NewMI = prior(MBBI);
diff --git a/lib/Target/X86/X86RegisterInfo.cpp b/lib/Target/X86/X86RegisterInfo.cpp
index 06c671b..9260f1d 100644
--- a/lib/Target/X86/X86RegisterInfo.cpp
+++ b/lib/Target/X86/X86RegisterInfo.cpp
@@ -576,7 +576,8 @@ X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
unsigned BasePtr;
unsigned Opc = MI.getOpcode();
- bool AfterFPPop = Opc == X86::TAILJMPm64 || Opc == X86::TAILJMPm;
+ bool AfterFPPop = (Opc == X86::TAILJMPm64 ||
+ Opc == X86::TAILJMPm);
if (needsStackRealignment(MF))
BasePtr = (FrameIndex < 0 ? FramePtr : StackPtr);
else if (AfterFPPop)
--
1.7.1.GIT
-------------- next part --------------
From 27857943a0cbd919b1f60ac8eb562b6a01bb8ef9 Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Thu, 6 Jan 2011 09:09:52 +0900
Subject: [PATCH 8/9] TableGen/EDEmitter.cpp: Add TCW64.
---
utils/TableGen/EDEmitter.cpp | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)
diff --git a/utils/TableGen/EDEmitter.cpp b/utils/TableGen/EDEmitter.cpp
index 7ee1019..b3e442d 100644
--- a/utils/TableGen/EDEmitter.cpp
+++ b/utils/TableGen/EDEmitter.cpp
@@ -264,6 +264,7 @@ static int X86TypeFromOpName(LiteralConstantEmitter *type,
REG("RFP32");
REG("GR64");
REG("GR64_TC");
+ REG("GR64_TCW64");
REG("FR64");
REG("VR64");
REG("RFP64");
@@ -297,6 +298,7 @@ static int X86TypeFromOpName(LiteralConstantEmitter *type,
MEM("opaque48mem");
MEM("i64mem");
MEM("i64mem_TC");
+ MEM("i64mem_TCW64");
MEM("f64mem");
MEM("sdmem");
MEM("f80mem");
--
1.7.1.GIT
-------------- next part --------------
From ff6e72615a2a6bf8549c76550b618d6aafdaeefd Mon Sep 17 00:00:00 2001
From: NAKAMURA Takumi <geek4civic at gmail.com>
Date: Thu, 6 Jan 2011 06:59:35 +0900
Subject: [PATCH 9/9] Target/X86: Tweak win64's tailcall.
---
lib/Target/X86/X86FrameLowering.cpp | 12 ++++++++++++
lib/Target/X86/X86InstrCompiler.td | 24 ++++++++++++++++++++----
lib/Target/X86/X86InstrControl.td | 33 +++++++++++++++++++++++++++++++++
lib/Target/X86/X86InstrInfo.cpp | 2 ++
lib/Target/X86/X86InstrInfo.td | 6 ++++++
lib/Target/X86/X86MCInstLower.cpp | 4 ++++
lib/Target/X86/X86RegisterInfo.cpp | 1 +
lib/Target/X86/X86RegisterInfo.td | 3 +++
test/CodeGen/X86/tailcallstack64.ll | 16 ++++++++++------
9 files changed, 91 insertions(+), 10 deletions(-)
diff --git a/lib/Target/X86/X86FrameLowering.cpp b/lib/Target/X86/X86FrameLowering.cpp
index d636105..712eded 100644
--- a/lib/Target/X86/X86FrameLowering.cpp
+++ b/lib/Target/X86/X86FrameLowering.cpp
@@ -108,6 +108,9 @@ static unsigned findDeadCallerSavedReg(MachineBasicBlock &MBB,
case X86::TCRETURNdi64:
case X86::TCRETURNri64:
case X86::TCRETURNmi64:
+ case X86::TCRETURNdiW64:
+ case X86::TCRETURNriW64:
+ case X86::TCRETURNmiW64:
case X86::EH_RETURN:
case X86::EH_RETURN64: {
SmallSet<unsigned, 8> Uses;
@@ -670,6 +673,9 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
case X86::TCRETURNdi64:
case X86::TCRETURNri64:
case X86::TCRETURNmi64:
+ case X86::TCRETURNdiW64:
+ case X86::TCRETURNriW64:
+ case X86::TCRETURNmiW64:
case X86::EH_RETURN:
case X86::EH_RETURN64:
break; // These are ok
@@ -760,10 +766,13 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
// We're returning from function via eh_return.
bool isRel = (RetOpcode == X86::TCRETURNdi ||
+ RetOpcode == X86::TCRETURNdiW64 ||
RetOpcode == X86::TCRETURNdi64);
bool isMem = (RetOpcode == X86::TCRETURNmi ||
+ RetOpcode == X86::TCRETURNmiW64 ||
RetOpcode == X86::TCRETURNmi64);
bool isReg = (RetOpcode == X86::TCRETURNri ||
+ RetOpcode == X86::TCRETURNriW64 ||
RetOpcode == X86::TCRETURNri64);
if (RetOpcode == X86::EH_RETURN || RetOpcode == X86::EH_RETURN64) {
MBBI = prior(MBB.end());
@@ -800,6 +809,7 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
MachineInstrBuilder MIB =
BuildMI(MBB, MBBI, DL,
TII.get(RetOpcode == X86::TCRETURNdi ? X86::TAILJMPd
+ : RetOpcode == X86::TCRETURNdiW64 ? X86::TAILJMPdW64
: X86::TAILJMPd64));
if (JumpTarget.isGlobal())
MIB.addGlobalAddress(JumpTarget.getGlobal(), JumpTarget.getOffset(),
@@ -813,12 +823,14 @@ void X86FrameLowering::emitEpilogue(MachineFunction &MF,
MachineInstrBuilder MIB =
BuildMI(MBB, MBBI, DL,
TII.get(RetOpcode == X86::TCRETURNmi ? X86::TAILJMPm
+ : RetOpcode == X86::TCRETURNmiW64 ? X86::TAILJMPmW64
: X86::TAILJMPm64));
for (unsigned i = 0; i != 5; ++i)
MIB.addOperand(MBBI->getOperand(i));
} else if (isReg) {
BuildMI(MBB, MBBI, DL,
TII.get(RetOpcode == X86::TCRETURNri64 ? X86::TAILJMPr64
+ : RetOpcode == X86::TCRETURNriW64 ? X86::TAILJMPrW64
: X86::TAILJMPr)).
addReg(JumpTarget.getReg(), RegState::Kill);
} else {
diff --git a/lib/Target/X86/X86InstrCompiler.td b/lib/Target/X86/X86InstrCompiler.td
index d2c5763..5d55fa6 100644
--- a/lib/Target/X86/X86InstrCompiler.td
+++ b/lib/Target/X86/X86InstrCompiler.td
@@ -868,19 +868,35 @@ def : Pat<(X86tcret (i32 texternalsym:$dst), imm:$off),
def : Pat<(X86tcret GR64_TC:$dst, imm:$off),
(TCRETURNri64 GR64_TC:$dst, imm:$off)>,
- Requires<[In64BitMode]>;
+ Requires<[In64BitMode, NotWin64]>;
def : Pat<(X86tcret (load addr:$dst), imm:$off),
(TCRETURNmi64 addr:$dst, imm:$off)>,
- Requires<[In64BitMode]>;
+ Requires<[In64BitMode, NotWin64]>;
def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
(TCRETURNdi64 tglobaladdr:$dst, imm:$off)>,
- Requires<[In64BitMode]>;
+ Requires<[In64BitMode, NotWin64]>;
def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),
(TCRETURNdi64 texternalsym:$dst, imm:$off)>,
- Requires<[In64BitMode]>;
+ Requires<[In64BitMode, NotWin64]>;
+
+def : Pat<(X86tcret GR64_TCW64:$dst, imm:$off),
+ (TCRETURNriW64 GR64_TCW64:$dst, imm:$off)>,
+ Requires<[IsWin64]>;
+
+def : Pat<(X86tcret (load addr:$dst), imm:$off),
+ (TCRETURNmiW64 addr:$dst, imm:$off)>,
+ Requires<[IsWin64]>;
+
+def : Pat<(X86tcret (i64 tglobaladdr:$dst), imm:$off),
+ (TCRETURNdiW64 tglobaladdr:$dst, imm:$off)>,
+ Requires<[IsWin64]>;
+
+def : Pat<(X86tcret (i64 texternalsym:$dst), imm:$off),
+ (TCRETURNdiW64 texternalsym:$dst, imm:$off)>,
+ Requires<[IsWin64]>;
// Normal calls, with various flavors of addresses.
def : Pat<(X86call (i32 tglobaladdr:$dst)),
diff --git a/lib/Target/X86/X86InstrControl.td b/lib/Target/X86/X86InstrControl.td
index 31f2832..c4939a8 100644
--- a/lib/Target/X86/X86InstrControl.td
+++ b/lib/Target/X86/X86InstrControl.td
@@ -301,3 +301,36 @@ let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
def TAILJMPm64 : I<0xFF, MRM4m, (outs), (ins i64mem_TC:$dst, variable_ops),
"jmp{q}\t{*}$dst # TAILCALL", []>;
}
+
+let isCall = 1, isTerminator = 1, isReturn = 1, isBarrier = 1,
+ isCodeGenOnly = 1 in
+ let Defs = [RAX, RCX, RDX, R8, R9, R10, R11,
+ FP0, FP1, FP2, FP3, FP4, FP5, FP6, ST0, ST1,
+ MM0, MM1, MM2, MM3, MM4, MM5, MM6, MM7,
+ XMM0, XMM1, XMM2, XMM3, XMM4, XMM5, EFLAGS],
+ Uses = [RSP] in {
+ def TCRETURNdiW64 : PseudoI<(outs),
+ (ins i64i32imm_pcrel:$dst, i32imm:$offset, variable_ops),
+ []>,
+ Requires<[IsWin64]>;
+ def TCRETURNriW64 : PseudoI<(outs),
+ (ins GR64_TCW64:$dst, i32imm:$offset, variable_ops), []>,
+ Requires<[IsWin64]>;
+ let mayLoad = 1 in
+ def TCRETURNmiW64 : PseudoI<(outs),
+ (ins i64mem_TCW64:$dst, i32imm:$offset, variable_ops), []>,
+ Requires<[IsWin64]>;
+
+ def TAILJMPdW64 : Ii32PCRel<0xE9, RawFrm, (outs),
+ (ins i64i32imm_pcrel:$dst, variable_ops),
+ "jmp\t$dst # TAILCALL", []>,
+ Requires<[IsWin64]>;
+ def TAILJMPrW64 : I<0xFF, MRM4r, (outs), (ins GR64_TCW64:$dst, variable_ops),
+ "jmp{q}\t{*}$dst # TAILCALL", []>,
+ Requires<[IsWin64]>;
+
+ let mayLoad = 1 in
+ def TAILJMPmW64 : I<0xFF, MRM4m, (outs), (ins i64mem_TCW64:$dst, variable_ops),
+ "jmp{q}\t{*}$dst # TAILCALL", []>,
+ Requires<[IsWin64]>;
+}
diff --git a/lib/Target/X86/X86InstrInfo.cpp b/lib/Target/X86/X86InstrInfo.cpp
index 63dcd14..c3c3b8c 100644
--- a/lib/Target/X86/X86InstrInfo.cpp
+++ b/lib/Target/X86/X86InstrInfo.cpp
@@ -321,6 +321,7 @@ X86InstrInfo::X86InstrInfo(X86TargetMachine &tm)
{ X86::SETSr, X86::SETSm, 0, 0 },
{ X86::TAILJMPr, X86::TAILJMPm, 1, 0 },
{ X86::TAILJMPr64, X86::TAILJMPm64, 1, 0 },
+ { X86::TAILJMPrW64, X86::TAILJMPmW64, 1, 0 },
{ X86::TEST16ri, X86::TEST16mi, 1, 0 },
{ X86::TEST32ri, X86::TEST32mi, 1, 0 },
{ X86::TEST64ri32, X86::TEST64mi32, 1, 0 },
@@ -2025,6 +2026,7 @@ static unsigned getLoadStoreRegOpcode(unsigned Reg,
case X86::GR64_NOREX_NOSPRegClassID:
case X86::GR64_NOSPRegClassID:
case X86::GR64_TCRegClassID:
+ case X86::GR64_TCW64RegClassID:
return load ? X86::MOV64rm : X86::MOV64mr;
case X86::GR32RegClassID:
case X86::GR32_ABCDRegClassID:
diff --git a/lib/Target/X86/X86InstrInfo.td b/lib/Target/X86/X86InstrInfo.td
index 4748f13..78847a8 100644
--- a/lib/Target/X86/X86InstrInfo.td
+++ b/lib/Target/X86/X86InstrInfo.td
@@ -291,6 +291,12 @@ def i64mem_TC : Operand<i64> {
let ParserMatchClass = X86MemAsmOperand;
}
+def i64mem_TCW64 : Operand<i64> {
+ let PrintMethod = "printi64mem";
+ let MIOperandInfo = (ops GR64_TCW64, i8imm, GR64_TCW64, i32imm, i8imm);
+ let ParserMatchClass = X86MemAsmOperand;
+}
+
let ParserMatchClass = X86AbsMemAsmOperand,
PrintMethod = "print_pcrel_imm" in {
def i32imm_pcrel : Operand<i32>;
diff --git a/lib/Target/X86/X86MCInstLower.cpp b/lib/Target/X86/X86MCInstLower.cpp
index 4159af1..0b9679e 100644
--- a/lib/Target/X86/X86MCInstLower.cpp
+++ b/lib/Target/X86/X86MCInstLower.cpp
@@ -399,6 +399,7 @@ ReSimplify:
// register inputs modeled as normal uses instead of implicit uses. As such,
// truncate off all but the first operand (the callee). FIXME: Change isel.
case X86::TAILJMPr64:
+ case X86::TAILJMPrW64:
case X86::CALL64r:
case X86::CALL64pcrel32:
case X86::WINCALL64r:
@@ -421,12 +422,14 @@ ReSimplify:
// TAILJMPd, TAILJMPd64 - Lower to the correct jump instructions.
case X86::TAILJMPr:
case X86::TAILJMPd:
+ case X86::TAILJMPdW64:
case X86::TAILJMPd64: {
unsigned Opcode;
switch (OutMI.getOpcode()) {
default: assert(0 && "Invalid opcode");
case X86::TAILJMPr: Opcode = X86::JMP32r; break;
case X86::TAILJMPd:
+ case X86::TAILJMPdW64:
case X86::TAILJMPd64: Opcode = X86::JMP_1; break;
}
@@ -618,6 +621,7 @@ void X86AsmPrinter::EmitInstruction(const MachineInstr *MI) {
case X86::TAILJMPr:
case X86::TAILJMPd:
case X86::TAILJMPd64:
+ case X86::TAILJMPdW64:
// Lower these as normal, but add some comments.
OutStreamer.AddComment("TAILCALL");
break;
diff --git a/lib/Target/X86/X86RegisterInfo.cpp b/lib/Target/X86/X86RegisterInfo.cpp
index 9260f1d..e1eaa4b 100644
--- a/lib/Target/X86/X86RegisterInfo.cpp
+++ b/lib/Target/X86/X86RegisterInfo.cpp
@@ -577,6 +577,7 @@ X86RegisterInfo::eliminateFrameIndex(MachineBasicBlock::iterator II,
unsigned Opc = MI.getOpcode();
bool AfterFPPop = (Opc == X86::TAILJMPm64 ||
+ Opc == X86::TAILJMPmW64 ||
Opc == X86::TAILJMPm);
if (needsStackRealignment(MF))
BasePtr = (FrameIndex < 0 ? FramePtr : StackPtr);
diff --git a/lib/Target/X86/X86RegisterInfo.td b/lib/Target/X86/X86RegisterInfo.td
index 45bb989..dd35d62 100644
--- a/lib/Target/X86/X86RegisterInfo.td
+++ b/lib/Target/X86/X86RegisterInfo.td
@@ -496,6 +496,9 @@ def GR64_TC : RegisterClass<"X86", [i64], 64, [RAX, RCX, RDX, RSI, RDI,
(GR32_TC sub_32bit)];
}
+def GR64_TCW64 : RegisterClass<"X86", [i64], 64, [RAX, R11, R10,
+ R9, R8, RDX, RCX]>;
+
// GR8_NOREX - GR8 registers which do not require a REX prefix.
def GR8_NOREX : RegisterClass<"X86", [i8], 8,
[AL, CL, DL, AH, CH, DH, BL, BH]> {
diff --git a/test/CodeGen/X86/tailcallstack64.ll b/test/CodeGen/X86/tailcallstack64.ll
index 52b074d..0c732d5 100644
--- a/test/CodeGen/X86/tailcallstack64.ll
+++ b/test/CodeGen/X86/tailcallstack64.ll
@@ -1,16 +1,20 @@
-; RUN: llc < %s -tailcallopt -march=x86-64 -post-RA-scheduler=true | FileCheck %s
+; RUN: llc < %s -tailcallopt -mtriple=x86_64-linux -post-RA-scheduler=true | FileCheck %s
+; RUN: llc < %s -tailcallopt -mtriple=x86_64-win32 -post-RA-scheduler=true | FileCheck %s
+
+; FIXME: Redundant unused stack allocation could be eliminated.
+; CHECK: subq ${{24|88}}, %rsp
; Check that lowered arguments on the stack do not overwrite each other.
; Add %in1 %p1 to a different temporary register (%eax).
-; CHECK: movl 32(%rsp), %eax
+; CHECK: movl [[A1:32|144]](%rsp), %eax
; Move param %in1 to temp register (%r10d).
-; CHECK: movl 40(%rsp), %r10d
+; CHECK: movl [[A2:40|152]](%rsp), %r10d
; Add %in1 %p1 to a different temporary register (%eax).
-; CHECK: addl %edi, %eax
+; CHECK: addl {{%edi|%ecx}}, %eax
; Move param %in2 to stack.
-; CHECK: movl %r10d, 32(%rsp)
+; CHECK: movl %r10d, [[A1]](%rsp)
; Move result of addition to stack.
-; CHECK: movl %eax, 40(%rsp)
+; CHECK: movl %eax, [[A2]](%rsp)
; Eventually, do a TAILCALL
; CHECK: TAILCALL
--
1.7.1.GIT
More information about the llvm-commits
mailing list