[llvm] r275981 - [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using generic IR

Andrea Di Biagio via llvm-commits llvm-commits at lists.llvm.org
Mon Jul 25 06:06:09 PDT 2016


For the record: the clang counterpart (revision 275981) introduced a
regression.

After revision 275981 the compiler fails to build this test:

///////
target triple = "x86_64-unknown-unknown"

define <4 x float> @test(<4 x float> %a, <2 x double>* nocapture readonly
%b) {
entry:
  %0 = load <2 x double>, <2 x double>* %b, align 16
  %1 = tail call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a, <2 x
double> %0)
  ret <4 x float> %1
}

declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>)
////////

> llc test.ll -o - -mattr=+avx

in X86MCCodeEmitter::encodeInstruction
Cannot encode all operands of: <MCInst 1073 <MCOperand Reg:126> <MCOperand
Reg:126> <MCOperand Reg:39> <MCOperand Imm:1> <MCOperand Reg:0> <MCOperand
Imm:0> <MCOperand Reg:0>>

I have already commented on the clang thread for revision 275981.

-Andrea

On Sat, Jul 23, 2016 at 5:15 PM, Nadav Rotem via llvm-commits <
llvm-commits at lists.llvm.org> wrote:

> LGTM.
>
>
> > On Jul 22, 2016, at 6:14 AM, Hans Wennborg <hans at chromium.org> wrote:
> >
> > Nadav: you're the X86 owner. What do you think?
> >
> >> On Thu, Jul 21, 2016 at 5:41 PM, Eli Friedman <eli.friedman at gmail.com>
> wrote:
> >> Nominating for backport to 3.9, so the intrinsics in question remain
> >> available.
> >>
> >> -Eli
> >>
> >>
> >> On Tue, Jul 19, 2016 at 8:07 AM, Simon Pilgrim via llvm-commits
> >> <llvm-commits at lists.llvm.org> wrote:
> >>>
> >>> Author: rksimon
> >>> Date: Tue Jul 19 10:07:43 2016
> >>> New Revision: 275981
> >>>
> >>> URL: http://llvm.org/viewvc/llvm-project?rev=275981&view=rev
> >>> Log:
> >>> [X86][SSE] Reimplement SSE fp2si conversion intrinsics instead of using
> >>> generic IR
> >>>
> >>> D20859 and D20860 attempted to replace the SSE (V)CVTTPS2DQ and
> VCVTTPD2DQ
> >>> truncating conversions with generic IR instead.
> >>>
> >>> It turns out that the behaviour of these intrinsics is different enough
> >>> from generic IR that this will cause problems, INF/NAN/out of range
> values
> >>> are guaranteed to result in a 0x80000000 value - which plays havoc with
> >>> constant folding which converts them to either zero or UNDEF. This is
> also
> >>> an issue with the scalar implementations (which were already generic
> IR and
> >>> what I was trying to match).
> >>>
> >>> This patch changes both scalar and packed versions back to using
> >>> x86-specific builtins.
> >>>
> >>> It also deals with the other scalar conversion cases that are runtime
> >>> rounding mode dependent and can have similar issues with constant
> folding.
> >>>
> >>> A companion clang patch is at D22105
> >>>
> >>> Differential Revision: https://reviews.llvm.org/D22106
> >>>
> >>> Modified:
> >>>    llvm/trunk/include/llvm/IR/IntrinsicsX86.td
> >>>    llvm/trunk/lib/Analysis/ConstantFolding.cpp
> >>>    llvm/trunk/lib/IR/AutoUpgrade.cpp
> >>>    llvm/trunk/lib/Target/X86/X86InstrSSE.td
> >>>    llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll
> >>>    llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll
> >>>    llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86.ll
> >>>    llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel-x86_64.ll
> >>>    llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll
> >>>    llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll
> >>>    llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
> >>>    llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll
> >>>    llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll
> >>>    llvm/trunk/test/Transforms/ConstProp/calls.ll
> >>>
> >>> Modified: llvm/trunk/include/llvm/IR/IntrinsicsX86.td
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/IR/IntrinsicsX86.td?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/include/llvm/IR/IntrinsicsX86.td (original)
> >>> +++ llvm/trunk/include/llvm/IR/IntrinsicsX86.td Tue Jul 19 10:07:43
> 2016
> >>> @@ -479,6 +479,8 @@ let TargetPrefix = "x86" in {  // All in
> >>>               Intrinsic<[llvm_v4f32_ty], [llvm_v2f64_ty], [IntrNoMem]>;
> >>>   def int_x86_sse2_cvtps2dq : GCCBuiltin<"__builtin_ia32_cvtps2dq">,
> >>>               Intrinsic<[llvm_v4i32_ty], [llvm_v4f32_ty], [IntrNoMem]>;
> >>> +  def int_x86_sse2_cvttps2dq : GCCBuiltin<"__builtin_ia32_cvttps2dq">,
> >>> +              Intrinsic<[llvm_v4i32_ty], [llvm_v4f32_ty],
> [IntrNoMem]>;
> >>>   def int_x86_sse2_cvtsd2si : GCCBuiltin<"__builtin_ia32_cvtsd2si">,
> >>>               Intrinsic<[llvm_i32_ty], [llvm_v2f64_ty], [IntrNoMem]>;
> >>>   def int_x86_sse2_cvtsd2si64 :
> GCCBuiltin<"__builtin_ia32_cvtsd2si64">,
> >>> @@ -1512,8 +1514,12 @@ let TargetPrefix = "x86" in {  // All in
> >>>         Intrinsic<[llvm_v4f32_ty], [llvm_v4f64_ty], [IntrNoMem]>;
> >>>   def int_x86_avx_cvt_ps2dq_256 :
> >>> GCCBuiltin<"__builtin_ia32_cvtps2dq256">,
> >>>         Intrinsic<[llvm_v8i32_ty], [llvm_v8f32_ty], [IntrNoMem]>;
> >>> +  def int_x86_avx_cvtt_pd2dq_256 :
> >>> GCCBuiltin<"__builtin_ia32_cvttpd2dq256">,
> >>> +        Intrinsic<[llvm_v4i32_ty], [llvm_v4f64_ty], [IntrNoMem]>;
> >>>   def int_x86_avx_cvt_pd2dq_256 :
> >>> GCCBuiltin<"__builtin_ia32_cvtpd2dq256">,
> >>>         Intrinsic<[llvm_v4i32_ty], [llvm_v4f64_ty], [IntrNoMem]>;
> >>> +  def int_x86_avx_cvtt_ps2dq_256 :
> >>> GCCBuiltin<"__builtin_ia32_cvttps2dq256">,
> >>> +        Intrinsic<[llvm_v8i32_ty], [llvm_v8f32_ty], [IntrNoMem]>;
> >>> }
> >>>
> >>> // Vector bit test
> >>>
> >>> Modified: llvm/trunk/lib/Analysis/ConstantFolding.cpp
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/ConstantFolding.cpp?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/lib/Analysis/ConstantFolding.cpp (original)
> >>> +++ llvm/trunk/lib/Analysis/ConstantFolding.cpp Tue Jul 19 10:07:43
> 2016
> >>> @@ -1424,8 +1424,8 @@ Constant *ConstantFoldBinaryFP(double (*
> >>> /// integer type Ty is used to select how many bits are available for
> the
> >>> /// result. Returns null if the conversion cannot be performed,
> otherwise
> >>> /// returns the Constant value resulting from the conversion.
> >>> -Constant *ConstantFoldConvertToInt(const APFloat &Val, bool
> >>> roundTowardZero,
> >>> -                                   Type *Ty) {
> >>> +Constant *ConstantFoldSSEConvertToInt(const APFloat &Val, bool
> >>> roundTowardZero,
> >>> +                                      Type *Ty) {
> >>>   // All of these conversion intrinsics form an integer of at most
> >>> 64bits.
> >>>   unsigned ResultWidth = Ty->getIntegerBitWidth();
> >>>   assert(ResultWidth <= 64 &&
> >>> @@ -1438,7 +1438,8 @@ Constant *ConstantFoldConvertToInt(const
> >>>   APFloat::opStatus status = Val.convertToInteger(&UIntVal,
> ResultWidth,
> >>>                                                   /*isSigned=*/true,
> >>> mode,
> >>>                                                   &isExact);
> >>> -  if (status != APFloat::opOK && status != APFloat::opInexact)
> >>> +  if (status != APFloat::opOK &&
> >>> +      (!roundTowardZero || status != APFloat::opInexact))
> >>>     return nullptr;
> >>>   return ConstantInt::get(Ty, UIntVal, /*isSigned=*/true);
> >>> }
> >>> @@ -1676,17 +1677,17 @@ Constant *ConstantFoldScalarCall(StringR
> >>>       case Intrinsic::x86_sse2_cvtsd2si:
> >>>       case Intrinsic::x86_sse2_cvtsd2si64:
> >>>         if (ConstantFP *FPOp =
> >>> -
> dyn_cast_or_null<ConstantFP>(Op->getAggregateElement(0U)))
> >>> -          return ConstantFoldConvertToInt(FPOp->getValueAPF(),
> >>> -                                          /*roundTowardZero=*/false,
> Ty);
> >>> +
> >>> dyn_cast_or_null<ConstantFP>(Op->getAggregateElement(0U)))
> >>> +          return ConstantFoldSSEConvertToInt(FPOp->getValueAPF(),
> >>> +
>  /*roundTowardZero=*/false,
> >>> Ty);
> >>>       case Intrinsic::x86_sse_cvttss2si:
> >>>       case Intrinsic::x86_sse_cvttss2si64:
> >>>       case Intrinsic::x86_sse2_cvttsd2si:
> >>>       case Intrinsic::x86_sse2_cvttsd2si64:
> >>>         if (ConstantFP *FPOp =
> >>> -
> dyn_cast_or_null<ConstantFP>(Op->getAggregateElement(0U)))
> >>> -          return ConstantFoldConvertToInt(FPOp->getValueAPF(),
> >>> -                                          /*roundTowardZero=*/true,
> Ty);
> >>> +
> >>> dyn_cast_or_null<ConstantFP>(Op->getAggregateElement(0U)))
> >>> +          return ConstantFoldSSEConvertToInt(FPOp->getValueAPF(),
> >>> +                                             /*roundTowardZero=*/true,
> >>> Ty);
> >>>       }
> >>>     }
> >>>
> >>>
> >>> Modified: llvm/trunk/lib/IR/AutoUpgrade.cpp
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/IR/AutoUpgrade.cpp?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/lib/IR/AutoUpgrade.cpp (original)
> >>> +++ llvm/trunk/lib/IR/AutoUpgrade.cpp Tue Jul 19 10:07:43 2016
> >>> @@ -251,8 +251,6 @@ static bool UpgradeIntrinsicFunction1(Fu
> >>>          Name == "sse2.cvtps2pd" ||
> >>>          Name == "avx.cvtdq2.pd.256" ||
> >>>          Name == "avx.cvt.ps2.pd.256" ||
> >>> -         Name == "sse2.cvttps2dq" ||
> >>> -         Name.startswith("avx.cvtt.") ||
> >>>          Name.startswith("avx.vinsertf128.") ||
> >>>          Name == "avx2.vinserti128" ||
> >>>          Name.startswith("avx.vextractf128.") ||
> >>> @@ -712,12 +710,6 @@ void llvm::UpgradeIntrinsicCall(CallInst
> >>>         Rep = Builder.CreateSIToFP(Rep, DstTy, "cvtdq2pd");
> >>>       else
> >>>         Rep = Builder.CreateFPExt(Rep, DstTy, "cvtps2pd");
> >>> -    } else if (IsX86 && (Name == "sse2.cvttps2dq" ||
> >>> -                         Name.startswith("avx.cvtt."))) {
> >>> -      // Truncation (round to zero) float/double to i32 vector
> >>> conversion.
> >>> -      Value *Src = CI->getArgOperand(0);
> >>> -      VectorType *DstTy = cast<VectorType>(CI->getType());
> >>> -      Rep = Builder.CreateFPToSI(Src, DstTy, "cvtt");
> >>>     } else if (IsX86 && Name.startswith("sse4a.movnt.")) {
> >>>       Module *M = F->getParent();
> >>>       SmallVector<Metadata *, 1> Elts;
> >>>
> >>> Modified: llvm/trunk/lib/Target/X86/X86InstrSSE.td
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86InstrSSE.td?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/lib/Target/X86/X86InstrSSE.td (original)
> >>> +++ llvm/trunk/lib/Target/X86/X86InstrSSE.td Tue Jul 19 10:07:43 2016
> >>> @@ -2009,24 +2009,35 @@ def CVTPD2DQrr  : SDI<0xE6, MRMSrcReg, (
> >>> // SSE2 packed instructions with XS prefix
> >>> def VCVTTPS2DQrr : VS2SI<0x5B, MRMSrcReg, (outs VR128:$dst), (ins
> >>> VR128:$src),
> >>>                          "cvttps2dq\t{$src, $dst|$dst, $src}",
> >>> -                         [], IIC_SSE_CVT_PS_RR>, VEX,
> >>> Sched<[WriteCvtF2I]>;
> >>> +                         [(set VR128:$dst,
> >>> +                           (int_x86_sse2_cvttps2dq VR128:$src))],
> >>> +                         IIC_SSE_CVT_PS_RR>, VEX,
> Sched<[WriteCvtF2I]>;
> >>> def VCVTTPS2DQrm : VS2SI<0x5B, MRMSrcMem, (outs VR128:$dst), (ins
> >>> f128mem:$src),
> >>>                          "cvttps2dq\t{$src, $dst|$dst, $src}",
> >>> -                         [], IIC_SSE_CVT_PS_RM>, VEX,
> >>> Sched<[WriteCvtF2ILd]>;
> >>> +                         [(set VR128:$dst, (int_x86_sse2_cvttps2dq
> >>> +                                            (loadv4f32 addr:$src)))],
> >>> +                         IIC_SSE_CVT_PS_RM>, VEX,
> Sched<[WriteCvtF2ILd]>;
> >>> def VCVTTPS2DQYrr : VS2SI<0x5B, MRMSrcReg, (outs VR256:$dst), (ins
> >>> VR256:$src),
> >>>                           "cvttps2dq\t{$src, $dst|$dst, $src}",
> >>> -                          [], IIC_SSE_CVT_PS_RR>, VEX, VEX_L,
> >>> Sched<[WriteCvtF2I]>;
> >>> +                          [(set VR256:$dst,
> >>> +                            (int_x86_avx_cvtt_ps2dq_256 VR256:$src))],
> >>> +                          IIC_SSE_CVT_PS_RR>, VEX, VEX_L,
> >>> Sched<[WriteCvtF2I]>;
> >>> def VCVTTPS2DQYrm : VS2SI<0x5B, MRMSrcMem, (outs VR256:$dst), (ins
> >>> f256mem:$src),
> >>>                           "cvttps2dq\t{$src, $dst|$dst, $src}",
> >>> -                          [], IIC_SSE_CVT_PS_RM>, VEX, VEX_L,
> >>> +                          [(set VR256:$dst,
> (int_x86_avx_cvtt_ps2dq_256
> >>> +                                             (loadv8f32 addr:$src)))],
> >>> +                          IIC_SSE_CVT_PS_RM>, VEX, VEX_L,
> >>>                           Sched<[WriteCvtF2ILd]>;
> >>>
> >>> def CVTTPS2DQrr : S2SI<0x5B, MRMSrcReg, (outs VR128:$dst), (ins
> >>> VR128:$src),
> >>>                        "cvttps2dq\t{$src, $dst|$dst, $src}",
> >>> -                       [], IIC_SSE_CVT_PS_RR>, Sched<[WriteCvtF2I]>;
> >>> +                       [(set VR128:$dst, (int_x86_sse2_cvttps2dq
> >>> VR128:$src))],
> >>> +                       IIC_SSE_CVT_PS_RR>, Sched<[WriteCvtF2I]>;
> >>> def CVTTPS2DQrm : S2SI<0x5B, MRMSrcMem, (outs VR128:$dst), (ins
> >>> f128mem:$src),
> >>>                        "cvttps2dq\t{$src, $dst|$dst, $src}",
> >>> -                       [], IIC_SSE_CVT_PS_RM>, Sched<[WriteCvtF2ILd]>;
> >>> +                       [(set VR128:$dst,
> >>> +                         (int_x86_sse2_cvttps2dq (memopv4f32
> >>> addr:$src)))],
> >>> +                       IIC_SSE_CVT_PS_RM>, Sched<[WriteCvtF2ILd]>;
> >>>
> >>> let Predicates = [HasAVX] in {
> >>>   def : Pat<(int_x86_sse2_cvtdq2ps VR128:$src),
> >>> @@ -2096,10 +2107,14 @@ def VCVTTPD2DQXrm : VPDI<0xE6, MRMSrcMem
> >>> // YMM only
> >>> def VCVTTPD2DQYrr : VPDI<0xE6, MRMSrcReg, (outs VR128:$dst), (ins
> >>> VR256:$src),
> >>>                          "cvttpd2dq{y}\t{$src, $dst|$dst, $src}",
> >>> -                         [], IIC_SSE_CVT_PD_RR>, VEX, VEX_L,
> >>> Sched<[WriteCvtF2I]>;
> >>> +                         [(set VR128:$dst,
> >>> +                           (int_x86_avx_cvtt_pd2dq_256 VR256:$src))],
> >>> +                         IIC_SSE_CVT_PD_RR>, VEX, VEX_L,
> >>> Sched<[WriteCvtF2I]>;
> >>> def VCVTTPD2DQYrm : VPDI<0xE6, MRMSrcMem, (outs VR128:$dst), (ins
> >>> f256mem:$src),
> >>>                          "cvttpd2dq{y}\t{$src, $dst|$dst, $src}",
> >>> -                         [], IIC_SSE_CVT_PD_RM>, VEX, VEX_L,
> >>> Sched<[WriteCvtF2ILd]>;
> >>> +                         [(set VR128:$dst,
> >>> +                          (int_x86_avx_cvtt_pd2dq_256 (loadv4f64
> >>> addr:$src)))],
> >>> +                         IIC_SSE_CVT_PD_RM>, VEX, VEX_L,
> >>> Sched<[WriteCvtF2ILd]>;
> >>> def : InstAlias<"vcvttpd2dq\t{$src, $dst|$dst, $src}",
> >>>                 (VCVTTPD2DQYrr VR128:$dst, VR256:$src), 0>;
> >>>
> >>>
> >>> Modified: llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/avx-intrinsics-fast-isel.ll Tue Jul 19
> >>> 10:07:43 2016
> >>> @@ -681,10 +681,11 @@ define <2 x i64> @test_mm256_cvttpd_epi3
> >>> ; X64-NEXT:    vcvttpd2dqy %ymm0, %xmm0
> >>> ; X64-NEXT:    vzeroupper
> >>> ; X64-NEXT:    retq
> >>> -  %cvt = fptosi <4 x double> %a0 to <4 x i32>
> >>> +  %cvt = call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double> %a0)
> >>>   %res = bitcast <4 x i32> %cvt to <2 x i64>
> >>>   ret <2 x i64> %res
> >>> }
> >>> +declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>) nounwind
> >>> readnone
> >>>
> >>> define <4 x i64> @test_mm256_cvttps_epi32(<8 x float> %a0) nounwind {
> >>> ; X32-LABEL: test_mm256_cvttps_epi32:
> >>> @@ -696,10 +697,11 @@ define <4 x i64> @test_mm256_cvttps_epi3
> >>> ; X64:       # BB#0:
> >>> ; X64-NEXT:    vcvttps2dq %ymm0, %ymm0
> >>> ; X64-NEXT:    retq
> >>> -  %cvt = fptosi <8 x float> %a0 to <8 x i32>
> >>> +  %cvt = call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %a0)
> >>>   %res = bitcast <8 x i32> %cvt to <4 x i64>
> >>>   ret <4 x i64> %res
> >>> }
> >>> +declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>) nounwind
> >>> readnone
> >>>
> >>> define <4 x double> @test_mm256_div_pd(<4 x double> %a0, <4 x double>
> >>> %a1) nounwind {
> >>> ; X32-LABEL: test_mm256_div_pd:
> >>>
> >>> Modified: llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll
> (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86-upgrade.ll Tue Jul
> 19
> >>> 10:07:43 2016
> >>> @@ -359,35 +359,12 @@ define <4 x double> @test_x86_avx_cvt_ps
> >>> declare <4 x double> @llvm.x86.avx.cvt.ps2.pd.256(<4 x float>) nounwind
> >>> readnone
> >>>
> >>>
> >>> -define <4 x i32> @test_x86_avx_cvtt_pd2dq_256(<4 x double> %a0) {
> >>> -; CHECK-LABEL: test_x86_avx_cvtt_pd2dq_256:
> >>> -; CHECK:       ## BB#0:
> >>> -; CHECK-NEXT:    vcvttpd2dqy %ymm0, %xmm0
> >>> -; CHECK-NEXT:    vzeroupper
> >>> -; CHECK-NEXT:    retl
> >>> -  %res = call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>
> %a0) ;
> >>> <<4 x i32>> [#uses=1]
> >>> -  ret <4 x i32> %res
> >>> -}
> >>> -declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>) nounwind
> >>> readnone
> >>> -
> >>> -
> >>> -define <8 x i32> @test_x86_avx_cvtt_ps2dq_256(<8 x float> %a0) {
> >>> -; CHECK-LABEL: test_x86_avx_cvtt_ps2dq_256:
> >>> -; CHECK:       ## BB#0:
> >>> -; CHECK-NEXT:    vcvttps2dq %ymm0, %ymm0
> >>> -; CHECK-NEXT:    retl
> >>> -  %res = call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %a0)
> ;
> >>> <<8 x i32>> [#uses=1]
> >>> -  ret <8 x i32> %res
> >>> -}
> >>> -declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>) nounwind
> >>> readnone
> >>> -
> >>> -
> >>> define void @test_x86_sse2_storeu_dq(i8* %a0, <16 x i8> %a1) {
> >>>   ; add operation forces the execution domain.
> >>> ; CHECK-LABEL: test_x86_sse2_storeu_dq:
> >>> ; CHECK:       ## BB#0:
> >>> ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
> >>> -; CHECK-NEXT:    vpaddb LCPI34_0, %xmm0, %xmm0
> >>> +; CHECK-NEXT:    vpaddb LCPI32_0, %xmm0, %xmm0
> >>> ; CHECK-NEXT:    vmovdqu %xmm0, (%eax)
> >>> ; CHECK-NEXT:    retl
> >>>   %a2 = add <16 x i8> %a1, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1,
> i8
> >>> 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
> >>>
> >>> Modified: llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86.ll (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/avx-intrinsics-x86.ll Tue Jul 19
> 10:07:43
> >>> 2016
> >>> @@ -3431,6 +3431,39 @@ define <8 x float> @test_x86_avx_cvtdq2_
> >>> declare <8 x float> @llvm.x86.avx.cvtdq2.ps.256(<8 x i32>) nounwind
> >>> readnone
> >>>
> >>>
> >>> +define <4 x i32> @test_x86_avx_cvtt_pd2dq_256(<4 x double> %a0) {
> >>> +; AVX-LABEL: test_x86_avx_cvtt_pd2dq_256:
> >>> +; AVX:       ## BB#0:
> >>> +; AVX-NEXT:    vcvttpd2dqy %ymm0, %xmm0
> >>> +; AVX-NEXT:    vzeroupper
> >>> +; AVX-NEXT:    retl
> >>> +;
> >>> +; AVX512VL-LABEL: test_x86_avx_cvtt_pd2dq_256:
> >>> +; AVX512VL:       ## BB#0:
> >>> +; AVX512VL-NEXT:    vcvttpd2dqy %ymm0, %xmm0
> >>> +; AVX512VL-NEXT:    retl
> >>> +  %res = call <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>
> %a0) ;
> >>> <<4 x i32>> [#uses=1]
> >>> +  ret <4 x i32> %res
> >>> +}
> >>> +declare <4 x i32> @llvm.x86.avx.cvtt.pd2dq.256(<4 x double>) nounwind
> >>> readnone
> >>> +
> >>> +
> >>> +define <8 x i32> @test_x86_avx_cvtt_ps2dq_256(<8 x float> %a0) {
> >>> +; AVX-LABEL: test_x86_avx_cvtt_ps2dq_256:
> >>> +; AVX:       ## BB#0:
> >>> +; AVX-NEXT:    vcvttps2dq %ymm0, %ymm0
> >>> +; AVX-NEXT:    retl
> >>> +;
> >>> +; AVX512VL-LABEL: test_x86_avx_cvtt_ps2dq_256:
> >>> +; AVX512VL:       ## BB#0:
> >>> +; AVX512VL-NEXT:    vcvttps2dq %ymm0, %ymm0
> >>> +; AVX512VL-NEXT:    retl
> >>> +  %res = call <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float> %a0)
> ;
> >>> <<8 x i32>> [#uses=1]
> >>> +  ret <8 x i32> %res
> >>> +}
> >>> +declare <8 x i32> @llvm.x86.avx.cvtt.ps2dq.256(<8 x float>) nounwind
> >>> readnone
> >>> +
> >>> +
> >>> define <8 x float> @test_x86_avx_dp_ps_256(<8 x float> %a0, <8 x float>
> >>> %a1) {
> >>> ; AVX-LABEL: test_x86_avx_dp_ps_256:
> >>> ; AVX:       ## BB#0:
> >>> @@ -4552,7 +4585,7 @@ define void @movnt_dq(i8* %p, <2 x i64>
> >>> ; AVX-LABEL: movnt_dq:
> >>> ; AVX:       ## BB#0:
> >>> ; AVX-NEXT:    movl {{[0-9]+}}(%esp), %eax
> >>> -; AVX-NEXT:    vpaddq LCPI254_0, %xmm0, %xmm0
> >>> +; AVX-NEXT:    vpaddq LCPI256_0, %xmm0, %xmm0
> >>> ; AVX-NEXT:    vmovntdq %ymm0, (%eax)
> >>> ; AVX-NEXT:    vzeroupper
> >>> ; AVX-NEXT:    retl
> >>> @@ -4560,7 +4593,7 @@ define void @movnt_dq(i8* %p, <2 x i64>
> >>> ; AVX512VL-LABEL: movnt_dq:
> >>> ; AVX512VL:       ## BB#0:
> >>> ; AVX512VL-NEXT:    movl {{[0-9]+}}(%esp), %eax
> >>> -; AVX512VL-NEXT:    vpaddq LCPI254_0, %xmm0, %xmm0
> >>> +; AVX512VL-NEXT:    vpaddq LCPI256_0, %xmm0, %xmm0
> >>> ; AVX512VL-NEXT:    vmovntdq %ymm0, (%eax)
> >>> ; AVX512VL-NEXT:    retl
> >>>   %a2 = add <2 x i64> %a1, <i64 1, i64 1>
> >>>
> >>> Modified:
> llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel-x86_64.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel-x86_64.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel-x86_64.ll
> >>> (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel-x86_64.ll Tue
> Jul
> >>> 19 10:07:43 2016
> >>> @@ -6,13 +6,12 @@
> >>> define <4 x float> @test_mm_cvtsi64_ss(<4 x float> %a0, i64 %a1)
> nounwind
> >>> {
> >>> ; X64-LABEL: test_mm_cvtsi64_ss:
> >>> ; X64:       # BB#0:
> >>> -; X64-NEXT:    cvtsi2ssq %rdi, %xmm1
> >>> -; X64-NEXT:    movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
> >>> +; X64-NEXT:    cvtsi2ssq %rdi, %xmm0
> >>> ; X64-NEXT:    retq
> >>> -  %cvt = sitofp i64 %a1 to float
> >>> -  %res = insertelement <4 x float> %a0, float %cvt, i32 0
> >>> +  %res = call <4 x float> @llvm.x86.sse.cvtsi642ss(<4 x float> %a0,
> i64
> >>> %a1)
> >>>   ret <4 x float> %res
> >>> }
> >>> +declare <4 x float> @llvm.x86.sse.cvtsi642ss(<4 x float>, i64)
> nounwind
> >>> readnone
> >>>
> >>> define i64 @test_mm_cvtss_si64(<4 x float> %a0) nounwind {
> >>> ; X64-LABEL: test_mm_cvtss_si64:
> >>> @@ -29,7 +28,7 @@ define i64 @test_mm_cvttss_si64(<4 x flo
> >>> ; X64:       # BB#0:
> >>> ; X64-NEXT:    cvttss2si %xmm0, %rax
> >>> ; X64-NEXT:    retq
> >>> -  %cvt = extractelement <4 x float> %a0, i32 0
> >>> -  %res = fptosi float %cvt to i64
> >>> +  %res = call i64 @llvm.x86.sse.cvttss2si64(<4 x float> %a0)
> >>>   ret i64 %res
> >>> }
> >>> +declare i64 @llvm.x86.sse.cvttss2si64(<4 x float>) nounwind readnone
> >>>
> >>> Modified: llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/sse-intrinsics-fast-isel.ll Tue Jul 19
> >>> 10:07:43 2016
> >>> @@ -707,20 +707,17 @@ declare i32 @llvm.x86.sse.cvtss2si(<4 x
> >>> define <4 x float> @test_mm_cvtsi32_ss(<4 x float> %a0, i32 %a1)
> nounwind
> >>> {
> >>> ; X32-LABEL: test_mm_cvtsi32_ss:
> >>> ; X32:       # BB#0:
> >>> -; X32-NEXT:    movl {{[0-9]+}}(%esp), %eax
> >>> -; X32-NEXT:    cvtsi2ssl %eax, %xmm1
> >>> -; X32-NEXT:    movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
> >>> +; X32-NEXT:    cvtsi2ssl {{[0-9]+}}(%esp), %xmm0
> >>> ; X32-NEXT:    retl
> >>> ;
> >>> ; X64-LABEL: test_mm_cvtsi32_ss:
> >>> ; X64:       # BB#0:
> >>> -; X64-NEXT:    cvtsi2ssl %edi, %xmm1
> >>> -; X64-NEXT:    movss {{.*#+}} xmm0 = xmm1[0],xmm0[1,2,3]
> >>> +; X64-NEXT:    cvtsi2ssl %edi, %xmm0
> >>> ; X64-NEXT:    retq
> >>> -  %cvt = sitofp i32 %a1 to float
> >>> -  %res = insertelement <4 x float> %a0, float %cvt, i32 0
> >>> +  %res = call <4 x float> @llvm.x86.sse.cvtsi2ss(<4 x float> %a0, i32
> >>> %a1)
> >>>   ret <4 x float> %res
> >>> }
> >>> +declare <4 x float> @llvm.x86.sse.cvtsi2ss(<4 x float>, i32) nounwind
> >>> readnone
> >>>
> >>> define float @test_mm_cvtss_f32(<4 x float> %a0) nounwind {
> >>> ; X32-LABEL: test_mm_cvtss_f32:
> >>> @@ -762,10 +759,10 @@ define i32 @test_mm_cvttss_si(<4 x float
> >>> ; X64:       # BB#0:
> >>> ; X64-NEXT:    cvttss2si %xmm0, %eax
> >>> ; X64-NEXT:    retq
> >>> -  %cvt = extractelement <4 x float> %a0, i32 0
> >>> -  %res = fptosi float %cvt to i32
> >>> +  %res = call i32 @llvm.x86.sse.cvttss2si(<4 x float> %a0)
> >>>   ret i32 %res
> >>> }
> >>> +declare i32 @llvm.x86.sse.cvttss2si(<4 x float>) nounwind readnone
> >>>
> >>> define i32 @test_mm_cvttss_si32(<4 x float> %a0) nounwind {
> >>> ; X32-LABEL: test_mm_cvttss_si32:
> >>> @@ -777,8 +774,7 @@ define i32 @test_mm_cvttss_si32(<4 x flo
> >>> ; X64:       # BB#0:
> >>> ; X64-NEXT:    cvttss2si %xmm0, %eax
> >>> ; X64-NEXT:    retq
> >>> -  %cvt = extractelement <4 x float> %a0, i32 0
> >>> -  %res = fptosi float %cvt to i32
> >>> +  %res = call i32 @llvm.x86.sse.cvttss2si(<4 x float> %a0)
> >>>   ret i32 %res
> >>> }
> >>>
> >>>
> >>> Modified:
> llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll
> >>> (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel-x86_64.ll Tue
> >>> Jul 19 10:07:43 2016
> >>> @@ -25,13 +25,12 @@ define i64 @test_mm_cvtsi128_si64(<2 x i
> >>> define <2 x double> @test_mm_cvtsi64_sd(<2 x double> %a0, i64 %a1)
> >>> nounwind {
> >>> ; X64-LABEL: test_mm_cvtsi64_sd:
> >>> ; X64:       # BB#0:
> >>> -; X64-NEXT:    cvtsi2sdq %rdi, %xmm1
> >>> -; X64-NEXT:    movsd {{.*#+}} xmm0 = xmm1[0],xmm0[1]
> >>> +; X64-NEXT:    cvtsi2sdq %rdi, %xmm0
> >>> ; X64-NEXT:    retq
> >>> -  %cvt = sitofp i64 %a1 to double
> >>> -  %res = insertelement <2 x double> %a0, double %cvt, i32 0
> >>> +  %res = call <2 x double> @llvm.x86.sse2.cvtsi642sd(<2 x double> %a0,
> >>> i64 %a1)
> >>>   ret <2 x double> %res
> >>> }
> >>> +declare <2 x double> @llvm.x86.sse2.cvtsi642sd(<2 x double>, i64)
> >>> nounwind readnone
> >>>
> >>> define <2 x i64> @test_mm_cvtsi64_si128(i64 %a0) nounwind {
> >>> ; X64-LABEL: test_mm_cvtsi64_si128:
> >>> @@ -48,10 +47,10 @@ define i64 @test_mm_cvttsd_si64(<2 x dou
> >>> ; X64:       # BB#0:
> >>> ; X64-NEXT:    cvttsd2si %xmm0, %rax
> >>> ; X64-NEXT:    retq
> >>> -  %ext = extractelement <2 x double> %a0, i32 0
> >>> -  %res = fptosi double %ext to i64
> >>> +  %res = call i64 @llvm.x86.sse2.cvttsd2si64(<2 x double> %a0)
> >>>   ret i64 %res
> >>> }
> >>> +declare i64 @llvm.x86.sse2.cvttsd2si64(<2 x double>) nounwind readnone
> >>>
> >>> define <2 x i64> @test_mm_loadu_si64(i64* %a0) nounwind {
> >>> ; X64-LABEL: test_mm_loadu_si64:
> >>>
> >>> Modified: llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/sse2-intrinsics-fast-isel.ll Tue Jul 19
> >>> 10:07:43 2016
> >>> @@ -1208,6 +1208,21 @@ define i32 @test_mm_cvtsd_si32(<2 x doub
> >>> }
> >>> declare i32 @llvm.x86.sse2.cvtsd2si(<2 x double>) nounwind readnone
> >>>
> >>> +define <4 x float> @test_mm_cvtsd_ss(<4 x float> %a0, <2 x double>
> %a1) {
> >>> +; X32-LABEL: test_mm_cvtsd_ss:
> >>> +; X32:       # BB#0:
> >>> +; X32-NEXT:    cvtsd2ss %xmm1, %xmm0
> >>> +; X32-NEXT:    retl
> >>> +;
> >>> +; X64-LABEL: test_mm_cvtsd_ss:
> >>> +; X64:       # BB#0:
> >>> +; X64-NEXT:    cvtsd2ss %xmm1, %xmm0
> >>> +; X64-NEXT:    retq
> >>> +  %res = call <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float> %a0, <2
> x
> >>> double> %a1)
> >>> +  ret <4 x float> %res
> >>> +}
> >>> +declare <4 x float> @llvm.x86.sse2.cvtsd2ss(<4 x float>, <2 x double>)
> >>> nounwind readnone
> >>> +
> >>> define i32 @test_mm_cvtsi128_si32(<2 x i64> %a0) nounwind {
> >>> ; X32-LABEL: test_mm_cvtsi128_si32:
> >>> ; X32:       # BB#0:
> >>> @@ -1303,10 +1318,11 @@ define <2 x i64> @test_mm_cvttps_epi32(<
> >>> ; X64:       # BB#0:
> >>> ; X64-NEXT:    cvttps2dq %xmm0, %xmm0
> >>> ; X64-NEXT:    retq
> >>> -  %res = fptosi <4 x float> %a0 to <4 x i32>
> >>> +  %res = call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %a0)
> >>>   %bc = bitcast <4 x i32> %res to <2 x i64>
> >>>   ret <2 x i64> %bc
> >>> }
> >>> +declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>) nounwind
> readnone
> >>>
> >>> define i32 @test_mm_cvttsd_si32(<2 x double> %a0) nounwind {
> >>> ; X32-LABEL: test_mm_cvttsd_si32:
> >>> @@ -1318,10 +1334,10 @@ define i32 @test_mm_cvttsd_si32(<2 x dou
> >>> ; X64:       # BB#0:
> >>> ; X64-NEXT:    cvttsd2si %xmm0, %eax
> >>> ; X64-NEXT:    retq
> >>> -  %ext = extractelement <2 x double> %a0, i32 0
> >>> -  %res = fptosi double %ext to i32
> >>> +  %res = call i32 @llvm.x86.sse2.cvttsd2si(<2 x double> %a0)
> >>>   ret i32 %res
> >>> }
> >>> +declare i32 @llvm.x86.sse2.cvttsd2si(<2 x double>) nounwind readnone
> >>>
> >>> define <2 x double> @test_mm_div_pd(<2 x double> %a0, <2 x double> %a1)
> >>> nounwind {
> >>> ; X32-LABEL: test_mm_div_pd:
> >>>
> >>> Modified: llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll
> (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86-upgrade.ll Tue Jul
> 19
> >>> 10:07:43 2016
> >>> @@ -66,17 +66,6 @@ define <2 x double> @test_x86_sse2_cvtps
> >>> declare <2 x double> @llvm.x86.sse2.cvtps2pd(<4 x float>) nounwind
> >>> readnone
> >>>
> >>>
> >>> -define <4 x i32> @test_x86_sse2_cvttps2dq(<4 x float> %a0) {
> >>> -; CHECK-LABEL: test_x86_sse2_cvttps2dq:
> >>> -; CHECK:       ## BB#0:
> >>> -; CHECK-NEXT:    cvttps2dq %xmm0, %xmm0
> >>> -; CHECK-NEXT:    retl
> >>> -  %res = call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %a0) ;
> <<4 x
> >>> i32>> [#uses=1]
> >>> -  ret <4 x i32> %res
> >>> -}
> >>> -declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>) nounwind
> readnone
> >>> -
> >>> -
> >>> define void @test_x86_sse2_storel_dq(i8* %a0, <4 x i32> %a1) {
> >>> ; CHECK-LABEL: test_x86_sse2_storel_dq:
> >>> ; CHECK:       ## BB#0:
> >>> @@ -94,7 +83,7 @@ define void @test_x86_sse2_storeu_dq(i8*
> >>> ; CHECK-LABEL: test_x86_sse2_storeu_dq:
> >>> ; CHECK:       ## BB#0:
> >>> ; CHECK-NEXT:    movl {{[0-9]+}}(%esp), %eax
> >>> -; CHECK-NEXT:    paddb LCPI8_0, %xmm0
> >>> +; CHECK-NEXT:    paddb LCPI7_0, %xmm0
> >>> ; CHECK-NEXT:    movdqu %xmm0, (%eax)
> >>> ; CHECK-NEXT:    retl
> >>>   %a2 = add <16 x i8> %a1, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1,
> i8
> >>> 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
> >>>
> >>> Modified: llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll (original)
> >>> +++ llvm/trunk/test/CodeGen/X86/sse2-intrinsics-x86.ll Tue Jul 19
> 10:07:43
> >>> 2016
> >>> @@ -1,4 +1,4 @@
> >>> -; NOTE: Assertions have been autogenerated by
> update_llc_test_checks.py
> >>> +; NOTE: Assertions have been autogenerated by
> >>> utils/update_llc_test_checks.py
> >>> ; RUN: llc < %s -mtriple=i386-apple-darwin -mattr=-avx,+sse2 |
> FileCheck
> >>> %s --check-prefix=SSE
> >>> ; RUN: llc < %s -mtriple=i386-apple-darwin -mcpu=knl | FileCheck %s
> >>> --check-prefix=KNL
> >>>
> >>> @@ -322,6 +322,22 @@ define <4 x i32> @test_x86_sse2_cvttpd2d
> >>> declare <4 x i32> @llvm.x86.sse2.cvttpd2dq(<2 x double>) nounwind
> >>> readnone
> >>>
> >>>
> >>> +define <4 x i32> @test_x86_sse2_cvttps2dq(<4 x float> %a0) {
> >>> +; SSE-LABEL: test_x86_sse2_cvttps2dq:
> >>> +; SSE:       ## BB#0:
> >>> +; SSE-NEXT:    cvttps2dq %xmm0, %xmm0
> >>> +; SSE-NEXT:    retl
> >>> +;
> >>> +; KNL-LABEL: test_x86_sse2_cvttps2dq:
> >>> +; KNL:       ## BB#0:
> >>> +; KNL-NEXT:    vcvttps2dq %xmm0, %xmm0
> >>> +; KNL-NEXT:    retl
> >>> +  %res = call <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float> %a0) ;
> <<4 x
> >>> i32>> [#uses=1]
> >>> +  ret <4 x i32> %res
> >>> +}
> >>> +declare <4 x i32> @llvm.x86.sse2.cvttps2dq(<4 x float>) nounwind
> readnone
> >>> +
> >>> +
> >>> define i32 @test_x86_sse2_cvttsd2si(<2 x double> %a0) {
> >>> ; SSE-LABEL: test_x86_sse2_cvttsd2si:
> >>> ; SSE:       ## BB#0:
> >>>
> >>> Modified: llvm/trunk/test/Transforms/ConstProp/calls.ll
> >>> URL:
> >>>
> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/Transforms/ConstProp/calls.ll?rev=275981&r1=275980&r2=275981&view=diff
> >>>
> >>>
> ==============================================================================
> >>> --- llvm/trunk/test/Transforms/ConstProp/calls.ll (original)
> >>> +++ llvm/trunk/test/Transforms/ConstProp/calls.ll Tue Jul 19 10:07:43
> 2016
> >>> @@ -193,11 +193,13 @@ entry:
> >>>   ret i1 %b
> >>> }
> >>>
> >>> -; TODO: Inexact values should not fold as they are dependent on
> rounding
> >>> mode
> >>> +; Inexact values should not fold as they are dependent on rounding
> mode
> >>> define i1 @test_sse_cvts_inexact() nounwind readnone {
> >>> ; CHECK-LABEL: @test_sse_cvts_inexact(
> >>> -; CHECK-NOT: call
> >>> -; CHECK: ret i1 true
> >>> +; CHECK: call
> >>> +; CHECK: call
> >>> +; CHECK: call
> >>> +; CHECK: call
> >>> entry:
> >>>   %i0 = tail call i32 @llvm.x86.sse.cvtss2si(<4 x float> <float 1.75,
> >>> float undef, float undef, float undef>) nounwind
> >>>   %i1 = tail call i64 @llvm.x86.sse.cvtss2si64(<4 x float> <float 1.75,
> >>> float undef, float undef, float undef>) nounwind
> >>>
> >>>
> >>> _______________________________________________
> >>> llvm-commits mailing list
> >>> llvm-commits at lists.llvm.org
> >>> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
> >>
> >>
> _______________________________________________
> llvm-commits mailing list
> llvm-commits at lists.llvm.org
> http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20160725/6567c819/attachment.html>


More information about the llvm-commits mailing list