[llvm-commits] [llvm] r46307 - in /llvm/trunk: lib/Target/X86/X86ISelDAGToDAG.cpp lib/Target/X86/X86ISelLowering.cpp lib/Target/X86/X86InstrSSE.td test/CodeGen/X86/fp-stack-direct-ret.ll test/CodeGen/X86/fp-stack-ret-conv.ll test/CodeGen/X86/pr1505b.ll
Evan Cheng
evan.cheng at apple.com
Thu Jan 24 16:54:48 PST 2008
Is there a bugzilla on the scheduling deficiency?
Thx,
Evan
On Jan 24, 2008, at 12:07 AM, Chris Lattner <sabre at nondot.org> wrote:
> Author: lattner
> Date: Thu Jan 24 02:07:48 2008
> New Revision: 46307
>
> URL: http://llvm.org/viewvc/llvm-project?rev=46307&view=rev
> Log:
> Significantly simplify and improve handling of FP function results
> on x86-32.
> This case returns the value in ST(0) and then has to convert it to
> an SSE
> register. This causes significant codegen ugliness in some cases.
> For
> example in the trivial fp-stack-direct-ret.ll testcase we used to
> generate:
>
> _bar:
> subl $28, %esp
> call L_foo$stub
> fstpl 16(%esp)
> movsd 16(%esp), %xmm0
> movsd %xmm0, 8(%esp)
> fldl 8(%esp)
> addl $28, %esp
> ret
>
> because we move the result of foo() into an XMM register, then have to
> move it back for the return of bar.
>
> Instead of hacking ever-more special cases into the call result
> lowering code
> we take a much simpler approach: on x86-32, fp return is modeled as
> always
> returning into an f80 register which is then truncated to f32 or f64
> as needed.
> Similarly for a result, we model it as an extension to f80 + return.
>
> This exposes the truncate and extensions to the dag combiner,
> allowing target
> independent code to hack on them, eliminating them in this case.
> This gives
> us this code for the example above:
>
> _bar:
> subl $12, %esp
> call L_foo$stub
> addl $12, %esp
> ret
>
> The nasty aspect of this is that these conversions are not legal,
> but we want
> the second pass of dag combiner (post-legalize) to be able to hack
> on them.
> To handle this, we lie to legalize and say they are legal, then
> custom expand
> them on entry to the isel pass (PreprocessForFPConvert). This is
> gross, but
> less gross than the code it is replacing :)
>
> This also allows us to generate better code in several other cases.
> For
> example on fp-stack-ret-conv.ll, we now generate:
>
> _test:
> subl $12, %esp
> call L_foo$stub
> fstps 8(%esp)
> movl 16(%esp), %eax
> cvtss2sd 8(%esp), %xmm0
> movsd %xmm0, (%eax)
> addl $12, %esp
> ret
>
> where before we produced (incidentally, the old bad code is
> identical to what
> gcc produces):
>
> _test:
> subl $12, %esp
> call L_foo$stub
> fstpl (%esp)
> cvtsd2ss (%esp), %xmm0
> cvtss2sd %xmm0, %xmm0
> movl 16(%esp), %eax
> movsd %xmm0, (%eax)
> addl $12, %esp
> ret
>
> Note that we generate slightly worse code on pr1505b.ll due to a
> scheduling
> deficiency that is unrelated to this patch.
>
>
> Added:
> llvm/trunk/test/CodeGen/X86/fp-stack-direct-ret.ll
> llvm/trunk/test/CodeGen/X86/fp-stack-ret-conv.ll
> Modified:
> llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp
> llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> llvm/trunk/lib/Target/X86/X86InstrSSE.td
> llvm/trunk/test/CodeGen/X86/pr1505b.ll
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp?rev=46307&r1=46306&r2=46307&view=diff
>
> ===
> ===
> ===
> =====================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelDAGToDAG.cpp Thu Jan 24
> 02:07:48 2008
> @@ -156,7 +156,8 @@
> bool TryFoldLoad(SDOperand P, SDOperand N,
> SDOperand &Base, SDOperand &Scale,
> SDOperand &Index, SDOperand &Disp);
> - void InstructionSelectPreprocess(SelectionDAG &DAG);
> + void PreprocessForRMW(SelectionDAG &DAG);
> + void PreprocessForFPConvert(SelectionDAG &DAG);
>
> /// SelectInlineAsmMemoryOperand - Implement addressing mode
> selection for
> /// inline asm expressions.
> @@ -350,9 +351,10 @@
> Store.getOperand(2), Store.getOperand(3));
> }
>
> -/// InstructionSelectPreprocess - Preprocess the DAG to allow the
> instruction
> -/// selector to pick more load-modify-store instructions. This is a
> common
> -/// case:
> +/// PreprocessForRMW - Preprocess the DAG to make instruction
> selection better.
> +/// This is only run if not in -fast mode (aka -O0).
> +/// This allows the instruction selector to pick more read-modify-
> write
> +/// instructions. This is a common case:
> ///
> /// [Load chain]
> /// ^
> @@ -389,7 +391,7 @@
> /// \ /
> /// \ /
> /// [Store]
> -void X86DAGToDAGISel::InstructionSelectPreprocess(SelectionDAG
> &DAG) {
> +void X86DAGToDAGISel::PreprocessForRMW(SelectionDAG &DAG) {
> for (SelectionDAG::allnodes_iterator I = DAG.allnodes_begin(),
> E = DAG.allnodes_end(); I != E; ++I) {
> if (!ISD::isNON_TRUNCStore(I))
> @@ -459,6 +461,66 @@
> }
> }
>
> +
> +/// PreprocessForFPConvert - Walk over the dag lowering fpround and
> fpextend
> +/// nodes that target the FP stack to be store and load to the
> stack. This is a
> +/// gross hack. We would like to simply mark these as being
> illegal, but when
> +/// we do that, legalize produces these when it expands calls, then
> expands
> +/// these in the same legalize pass. We would like dag combine to
> be able to
> +/// hack on these between the call expansion and the node
> legalization. As such
> +/// this pass basically does "really late" legalization of these
> inline with the
> +/// X86 isel pass.
> +void X86DAGToDAGISel::PreprocessForFPConvert(SelectionDAG &DAG) {
> + for (SelectionDAG::allnodes_iterator I = DAG.allnodes_begin(),
> + E = DAG.allnodes_end(); I != E; ) {
> + SDNode *N = I++; // Preincrement iterator to avoid
> invalidation issues.
> + if (N->getOpcode() != ISD::FP_ROUND && N->getOpcode() !=
> ISD::FP_EXTEND)
> + continue;
> +
> + // If the source and destination are SSE registers, then this
> is a legal
> + // conversion that should not be lowered.
> + MVT::ValueType SrcVT = N->getOperand(0).getValueType();
> + MVT::ValueType DstVT = N->getValueType(0);
> + bool SrcIsSSE = X86Lowering.isScalarFPTypeInSSEReg(SrcVT);
> + bool DstIsSSE = X86Lowering.isScalarFPTypeInSSEReg(DstVT);
> + if (SrcIsSSE && DstIsSSE)
> + continue;
> +
> + // If this is an FPStack extension (but not a truncation), it
> is a noop.
> + if (!SrcIsSSE && !DstIsSSE && N->getOpcode() == ISD::FP_EXTEND)
> + continue;
> +
> + // Here we could have an FP stack truncation or an FPStack <->
> SSE convert.
> + // FPStack has extload and truncstore. SSE can fold direct
> loads into other
> + // operations. Based on this, decide what we want to do.
> + MVT::ValueType MemVT;
> + if (N->getOpcode() == ISD::FP_ROUND)
> + MemVT = DstVT; // FP_ROUND must use DstVT, we can't do a
> 'trunc load'.
> + else
> + MemVT = SrcIsSSE ? SrcVT : DstVT;
> +
> + SDOperand MemTmp = DAG.CreateStackTemporary(MemVT);
> +
> + // FIXME: optimize the case where the src/dest is a load or
> store?
> + SDOperand Store = DAG.getTruncStore(DAG.getEntryNode(), N-
> >getOperand(0),
> + MemTmp, NULL, 0, MemVT);
> + SDOperand Result = DAG.getExtLoad(ISD::EXTLOAD, DstVT, Store,
> MemTmp,
> + NULL, 0, MemVT);
> +
> + // We're about to replace all uses of the FP_ROUND/FP_EXTEND
> with the
> + // extload we created. This will cause general havok on the
> dag because
> + // anything below the conversion could be folded into other
> existing nodes.
> + // To avoid invalidating 'I', back it up to the convert node.
> + --I;
> + DAG.ReplaceAllUsesOfValueWith(SDOperand(N, 0), Result);
> +
> + // Now that we did that, the node is dead. Increment the
> iterator to the
> + // next node to process, then delete N.
> + ++I;
> + DAG.DeleteNode(N);
> + }
> +}
> +
> /// InstructionSelectBasicBlock - This callback is invoked by
> SelectionDAGISel
> /// when it has created a SelectionDAG for us to codegen.
> void X86DAGToDAGISel::InstructionSelectBasicBlock(SelectionDAG &DAG) {
> @@ -466,7 +528,10 @@
> MachineFunction::iterator FirstMBB = BB;
>
> if (!FastISel)
> - InstructionSelectPreprocess(DAG);
> + PreprocessForRMW(DAG);
> +
> + // FIXME: This should only happen when not -fast.
> + PreprocessForFPConvert(DAG);
>
> // Codegen the basic block.
> #ifndef NDEBUG
>
> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=46307&r1=46306&r2=46307&view=diff
>
> ===
> ===
> ===
> =====================================================================
> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Thu Jan 24
> 02:07:48 2008
> @@ -47,6 +47,7 @@
> X86ScalarSSEf32 = Subtarget->hasSSE1();
> X86StackPtr = Subtarget->is64Bit() ? X86::RSP : X86::ESP;
>
> + bool Fast = false;
>
> RegInfo = TM.getRegisterInfo();
>
> @@ -355,13 +356,15 @@
> addLegalFPImmediate(APFloat(+0.0)); // xorpd
> addLegalFPImmediate(APFloat(+0.0f)); // xorps
>
> - // Conversions to long double (in X87) go through memory.
> - setConvertAction(MVT::f32, MVT::f80, Expand);
> - setConvertAction(MVT::f64, MVT::f80, Expand);
> -
> - // Conversions from long double (in X87) go through memory.
> - setConvertAction(MVT::f80, MVT::f32, Expand);
> - setConvertAction(MVT::f80, MVT::f64, Expand);
> + // Floating truncations from f80 and extensions to f80 go
> through memory.
> + // If optimizing, we lie about this though and handle it in
> + // InstructionSelectPreprocess so that dagcombine2 can hack on
> these.
> + if (Fast) {
> + setConvertAction(MVT::f32, MVT::f80, Expand);
> + setConvertAction(MVT::f64, MVT::f80, Expand);
> + setConvertAction(MVT::f80, MVT::f32, Expand);
> + setConvertAction(MVT::f80, MVT::f64, Expand);
> + }
> } else if (X86ScalarSSEf32) {
> // Use SSE for f32, x87 for f64.
> // Set up the FP register classes.
> @@ -395,15 +398,17 @@
> addLegalFPImmediate(APFloat(-0.0)); // FLD0/FCHS
> addLegalFPImmediate(APFloat(-1.0)); // FLD1/FCHS
>
> - // SSE->x87 conversions go through memory.
> - setConvertAction(MVT::f32, MVT::f64, Expand);
> - setConvertAction(MVT::f32, MVT::f80, Expand);
> -
> - // x87->SSE truncations need to go through memory.
> - setConvertAction(MVT::f80, MVT::f32, Expand);
> - setConvertAction(MVT::f64, MVT::f32, Expand);
> - // And x87->x87 truncations also.
> - setConvertAction(MVT::f80, MVT::f64, Expand);
> + // SSE <-> X87 conversions go through memory. If optimizing,
> we lie about
> + // this though and handle it in InstructionSelectPreprocess so
> that
> + // dagcombine2 can hack on these.
> + if (Fast) {
> + setConvertAction(MVT::f32, MVT::f64, Expand);
> + setConvertAction(MVT::f32, MVT::f80, Expand);
> + setConvertAction(MVT::f80, MVT::f32, Expand);
> + setConvertAction(MVT::f64, MVT::f32, Expand);
> + // And x87->x87 truncations also.
> + setConvertAction(MVT::f80, MVT::f64, Expand);
> + }
>
> if (!UnsafeFPMath) {
> setOperationAction(ISD::FSIN , MVT::f64 , Expand);
> @@ -420,10 +425,14 @@
> setOperationAction(ISD::FCOPYSIGN, MVT::f64, Expand);
> setOperationAction(ISD::FCOPYSIGN, MVT::f32, Expand);
>
> - // Floating truncations need to go through memory.
> - setConvertAction(MVT::f80, MVT::f32, Expand);
> - setConvertAction(MVT::f64, MVT::f32, Expand);
> - setConvertAction(MVT::f80, MVT::f64, Expand);
> + // Floating truncations go through memory. If optimizing, we
> lie about
> + // this though and handle it in InstructionSelectPreprocess so
> that
> + // dagcombine2 can hack on these.
> + if (Fast) {
> + setConvertAction(MVT::f80, MVT::f32, Expand);
> + setConvertAction(MVT::f64, MVT::f32, Expand);
> + setConvertAction(MVT::f80, MVT::f64, Expand);
> + }
>
> if (!UnsafeFPMath) {
> setOperationAction(ISD::FSIN , MVT::f64 , Expand);
> @@ -647,7 +656,7 @@
> }
>
> setTruncStoreAction(MVT::f64, MVT::f32, Expand);
> -
> +
> // Custom lower v2i64 and v2f64 selects.
> setOperationAction(ISD::LOAD, MVT::v2f64, Legal);
> setOperationAction(ISD::LOAD, MVT::v2i64, Legal);
> @@ -808,30 +817,10 @@
> // a register.
> SDOperand Value = Op.getOperand(1);
>
> - // If this is an FP return with ScalarSSE, we need to move the
> value from
> - // an XMM register onto the fp-stack.
> - if (isScalarFPTypeInSSEReg(RVLocs[0].getValVT())) {
> - SDOperand MemLoc;
> -
> - // If this is a load into a scalarsse value, don't store the
> loaded value
> - // back to the stack, only to reload it: just replace the
> scalar-sse load.
> - if (ISD::isNON_EXTLoad(Value.Val) &&
> - Chain.reachesChainWithoutSideEffects(Value.getOperand
> (0))) {
> - Chain = Value.getOperand(0);
> - MemLoc = Value.getOperand(1);
> - } else {
> - // Spill the value to memory and reload it into top of stack.
> - unsigned Size = MVT::getSizeInBits(RVLocs[0].getValVT())/8;
> - MachineFunction &MF = DAG.getMachineFunction();
> - int SSFI = MF.getFrameInfo()->CreateStackObject(Size, Size);
> - MemLoc = DAG.getFrameIndex(SSFI, getPointerTy());
> - Chain = DAG.getStore(Op.getOperand(0), Value, MemLoc, NULL,
> 0);
> - }
> - SDVTList Tys = DAG.getVTList(RVLocs[0].getValVT(), MVT::Other);
> - SDOperand Ops[] = {Chain, MemLoc, DAG.getValueType(RVLocs
> [0].getValVT())};
> - Value = DAG.getNode(X86ISD::FLD, Tys, Ops, 3);
> - Chain = Value.getValue(1);
> - }
> + // an XMM register onto the fp-stack. Do this with an
> FP_EXTEND to f80.
> + // This will get legalized into a load/store if it can't get
> optimized away.
> + if (isScalarFPTypeInSSEReg(RVLocs[0].getValVT()))
> + Value = DAG.getNode(ISD::FP_EXTEND, MVT::f80, Value);
>
> SDVTList Tys = DAG.getVTList(MVT::Other, MVT::Flag);
> SDOperand Ops[] = { Chain, Value };
> @@ -876,87 +865,26 @@
> // Copies from the FP stack are special, as ST0 isn't a valid
> register
> // before the fp stackifier runs.
>
> - // Copy ST0 into an RFP register with FP_GET_RESULT.
> - SDVTList Tys = DAG.getVTList(RVLocs[0].getValVT(), MVT::Other,
> MVT::Flag);
> + // Copy ST0 into an RFP register with FP_GET_RESULT. If this
> will end up
> + // in an SSE register, copy it out as F80 and do a truncate,
> otherwise use
> + // the specified value type.
> + MVT::ValueType GetResultTy = RVLocs[0].getValVT();
> + if (isScalarFPTypeInSSEReg(GetResultTy))
> + GetResultTy = MVT::f80;
> + SDVTList Tys = DAG.getVTList(GetResultTy, MVT::Other, MVT::Flag);
> +
> SDOperand GROps[] = { Chain, InFlag };
> SDOperand RetVal = DAG.getNode(X86ISD::FP_GET_RESULT, Tys,
> GROps, 2);
> Chain = RetVal.getValue(1);
> InFlag = RetVal.getValue(2);
> -
> - // If we are using ScalarSSE, store ST(0) to the stack and
> reload it into
> - // an XMM register.
> - if (isScalarFPTypeInSSEReg(RVLocs[0].getValVT())) {
> - SDOperand StoreLoc;
> - const Value *SrcVal = 0;
> - int SrcValOffset = 0;
> - MVT::ValueType RetStoreVT = RVLocs[0].getValVT();
> -
> - // Determine where to store the value. If the call result is
> directly
> - // used by a store, see if we can store directly into the
> location. In
> - // this case, we'll end up producing a fst + movss[load] +
> movss[store] to
> - // the same location, and the two movss's will be nuked as
> dead. This
> - // optimizes common things like "*D = atof(..)" to not need an
> - // intermediate stack slot.
> - if (SDOperand(TheCall, 0).hasOneUse() &&
> - SDOperand(TheCall, 1).hasOneUse()) {
> - // In addition to direct uses, we also support a FP_ROUND
> that uses the
> - // value, if it is directly stored somewhere.
> - SDNode *User = *TheCall->use_begin();
> - if (User->getOpcode() == ISD::FP_ROUND && User->hasOneUse())
> - User = *User->use_begin();
> -
> - // Ok, we have one use of the value and one use of the
> chain. See if
> - // they are the same node: a store.
> - if (StoreSDNode *N = dyn_cast<StoreSDNode>(User)) {
> - // Verify that the value being stored is either the call
> or a
> - // truncation of the call.
> - SDNode *StoreVal = N->getValue().Val;
> - if (StoreVal == TheCall)
> - ; // ok.
> - else if (StoreVal->getOpcode() == ISD::FP_ROUND &&
> - StoreVal->hasOneUse() &&
> - StoreVal->getOperand(0).Val == TheCall)
> - ; // ok.
> - else
> - N = 0; // not ok.
> -
> - if (N && N->getChain().Val == TheCall &&
> - !N->isVolatile() && !N->isTruncatingStore() &&
> - N->getAddressingMode() == ISD::UNINDEXED) {
> - StoreLoc = N->getBasePtr();
> - SrcVal = N->getSrcValue();
> - SrcValOffset = N->getSrcValueOffset();
> - RetStoreVT = N->getValue().getValueType();
> - }
> - }
> - }
>
> - // If we weren't able to optimize the result, just create a
> temporary
> - // stack slot.
> - if (StoreLoc.Val == 0) {
> - MachineFunction &MF = DAG.getMachineFunction();
> - int SSFI = MF.getFrameInfo()->CreateStackObject(8, 8);
> - StoreLoc = DAG.getFrameIndex(SSFI, getPointerTy());
> - }
> -
> - // FIXME: Currently the FST is flagged to the FP_GET_RESULT.
> This
> - // shouldn't be necessary except that RFP cannot be live across
> - // multiple blocks (which could happen if a select gets
> lowered into
> - // multiple blocks and scheduled in between them). When
> stackifier is
> - // fixed, they can be uncoupled.
> - SDOperand Ops[] = {
> - Chain, RetVal, StoreLoc, DAG.getValueType(RetStoreVT), InFlag
> - };
> - Chain = DAG.getNode(X86ISD::FST, MVT::Other, Ops, 5);
> - RetVal = DAG.getLoad(RetStoreVT, Chain,
> - StoreLoc, SrcVal, SrcValOffset);
> - Chain = RetVal.getValue(1);
> -
> - // If we optimized a truncate, then extend the result back to
> its desired
> - // type.
> - if (RVLocs[0].getValVT() != RetStoreVT)
> - RetVal = DAG.getNode(ISD::FP_EXTEND, RVLocs[0].getValVT(),
> RetVal);
> - }
> + // If we want the result in an SSE register, use an FP_TRUNCATE
> to get it
> + // there.
> + if (GetResultTy != RVLocs[0].getValVT())
> + RetVal = DAG.getNode(ISD::FP_ROUND, RVLocs[0].getValVT(),
> RetVal,
> + // This truncation won't change the value.
> + DAG.getIntPtrConstant(1));
> +
> ResultVals.push_back(RetVal);
> }
>
>
> Modified: llvm/trunk/lib/Target/X86/X86InstrSSE.td
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrSSE.td?rev=46307&r1=46306&r2=46307&view=diff
>
> ===
> ===
> ===
> =====================================================================
> --- llvm/trunk/lib/Target/X86/X86InstrSSE.td (original)
> +++ llvm/trunk/lib/Target/X86/X86InstrSSE.td Thu Jan 24 02:07:48 2008
> @@ -2734,6 +2734,14 @@
> def : Pat<(v4i32 (undef)), (IMPLICIT_DEF_VR128)>, Requires<[HasSSE2]>;
> def : Pat<(v2i64 (undef)), (IMPLICIT_DEF_VR128)>, Requires<[HasSSE2]>;
>
> +// extload f32 -> f64. This matches load+fextend because we have a
> hack in
> +// the isel (PreprocessForFPConvert) that can introduce loads after
> dag combine.
> +// Since these loads aren't folded into the fextend, we have to
> match it
> +// explicitly here.
> +let Predicates = [HasSSE2] in
> + def : Pat<(fextend (loadf32 addr:$src)),
> + (CVTSS2SDrm addr:$src)>;
> +
> // Scalar to v8i16 / v16i8. The source may be a GR32, but only the
> lower 8 or
> // 16-bits matter.
> def : Pat<(v8i16 (X86s2vec GR32:$src)), (MOVDI2PDIrr GR32:$src)>,
>
> Added: llvm/trunk/test/CodeGen/X86/fp-stack-direct-ret.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/fp-stack-direct-ret.ll?rev=46307&view=auto
>
> ===
> ===
> ===
> =====================================================================
> --- llvm/trunk/test/CodeGen/X86/fp-stack-direct-ret.ll (added)
> +++ llvm/trunk/test/CodeGen/X86/fp-stack-direct-ret.ll Thu Jan 24
> 02:07:48 2008
> @@ -0,0 +1,11 @@
> +; RUN: llvm-as < %s | llc -march=x86 | not grep fstp
> +; RUN: llvm-as < %s | llc -march=x86 -mcpu=yonah | not grep movsd
> +
> +declare double @foo()
> +
> +define double @bar() {
> +entry:
> + %tmp5 = tail call double @foo()
> + ret double %tmp5
> +}
> +
>
> Added: llvm/trunk/test/CodeGen/X86/fp-stack-ret-conv.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/fp-stack-ret-conv.ll?rev=46307&view=auto
>
> ===
> ===
> ===
> =====================================================================
> --- llvm/trunk/test/CodeGen/X86/fp-stack-ret-conv.ll (added)
> +++ llvm/trunk/test/CodeGen/X86/fp-stack-ret-conv.ll Thu Jan 24
> 02:07:48 2008
> @@ -0,0 +1,17 @@
> +; RUN: llvm-as < %s | llc -mcpu=yonah | grep cvtss2sd
> +; RUN: llvm-as < %s | llc -mcpu=yonah | grep fstps
> +; RUN: llvm-as < %s | llc -mcpu=yonah | not grep cvtsd2ss
> +
> +target datalayout = "e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-
> i64:32:64-f32:32:32-f64:32:64-v64:64:64-v128:128:128-a0:0:64"
> +target triple = "i686-apple-darwin8"
> +
> +define void @test(double *%b) {
> +entry:
> + %tmp13 = tail call double @foo()
> + %tmp1314 = fptrunc double %tmp13 to float ; <float>
> [#uses=1]
> + %tmp3940 = fpext float %tmp1314 to double ; <double>
> [#uses=1]
> + volatile store double %tmp3940, double* %b
> + ret void
> +}
> +
> +declare double @foo()
>
> Modified: llvm/trunk/test/CodeGen/X86/pr1505b.ll
> URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/pr1505b.ll?rev=46307&r1=46306&r2=46307&view=diff
>
> ===
> ===
> ===
> =====================================================================
> --- llvm/trunk/test/CodeGen/X86/pr1505b.ll (original)
> +++ llvm/trunk/test/CodeGen/X86/pr1505b.ll Thu Jan 24 02:07:48 2008
> @@ -1,4 +1,4 @@
> -; RUN: llvm-as < %s | llc -mcpu=i486 | grep fstpl | count 3
> +; RUN: llvm-as < %s | llc -mcpu=i486 | grep fstpl | count 4
> ; RUN: llvm-as < %s | llc -mcpu=i486 | grep fstps | count 3
> ; PR1505
>
>
>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at cs.uiuc.edu
> http://lists.cs.uiuc.edu/mailman/listinfo/llvm-commits
More information about the llvm-commits
mailing list