[llvm] r355741 - [x86] scalarize extract element 0 of FP cmp

Sanjay Patel via llvm-commits llvm-commits at lists.llvm.org
Tue Mar 19 13:58:29 PDT 2019


This example shows a bunch of interesting bugs/opportunities.
Unfortunately, I think this is a case of going from lucky to unlucky. Ie,
we made the right local choice in the DAG, but it ended up being worse
overall.
I've tried to demonstrate part of the backend problem here:
https://bugs.llvm.org/show_bug.cgi?id=41145

But it's also worth noting that -- for this benchmark -- I think we could
avoid those heuristic-based problems by improving the IR optimizer:
https://bugs.llvm.org/show_bug.cgi?id=41101
https://bugs.llvm.org/show_bug.cgi?id=41113

Finally -- and I'm not trying to divert from the bugs (I'm planning to work
on them) -- but the AVX vectorized IR (masked store) and subsequent codegen
for this benchmark is drastically different and performs much better in my
local testing, so if the perf of this code is important and you can compile
for an AVX2 target, you should do that.

On Fri, Mar 15, 2019 at 5:17 PM Sanjay Patel <spatel at rotateright.com> wrote:

> Ok, so both of my guesses were wrong. SSE4.2 is an interesting choice...
> But that's a relief...I was chasing a phantom AVX diff.
> SSE4.2 looks almost the same as SSE2: there's an extract+cmp+branch
> without this patch, and then and/andn/or after this patch (no blendv).
>
> On Fri, Mar 15, 2019 at 4:52 PM Eric Christopher <echristo at gmail.com>
> wrote:
>
>> My bad, thanks Chandler :)
>>
>> -eric
>>
>> On Fri, Mar 15, 2019 at 3:48 PM Chandler Carruth <chandlerc at google.com>
>> wrote:
>> >
>> > On Fri, Mar 15, 2019 at 3:36 PM Eric Christopher via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>> >>
>> >> -march=haswell -mprefer-vector-width=128 (and a few other things)
>> typically.
>> >
>> >
>> > We actually saw this regression with -march=x86-64 -msse4.2, and
>> running on a haswell (xeon) machine.
>> >
>> >>
>> >>
>> >> -eric
>> >>
>> >> On Fri, Mar 15, 2019 at 3:32 PM Sanjay Patel via llvm-commits
>> >> <llvm-commits at lists.llvm.org> wrote:
>> >> >
>> >> > Need to confirm something: when you say "targeting Haswell", does
>> that mean you're compiling with -mavx / -mavx2? Or you're compiling for
>> default SSE2 while running on Haswell?
>> >> > I don't see any asm diffs when building with AVX/AVX2 and running on
>> Haswell, but I do see the expected suspect 'branch-became-a-select' when
>> building with SSE2.
>> >> >
>> >> > On Fri, Mar 15, 2019 at 7:35 AM Sanjay Patel <spatel at rotateright.com>
>> wrote:
>> >> >>
>> >> >> From the test diffs in the patch, I'm guessing we converted a
>> predictable compare+branch into a blendv, so that was a loser.
>> >> >> If you have a reduced test already, please let me know. Otherwise,
>> I'll start investigating using the test-suite sources.
>> >> >>
>> >> >> This patch was intended to be an intermediate step towards the
>> problems seen in PR39665 and related bugs (use 'movmsk' more liberally), so
>> it's possible that we could reverse the order of those and avoid any
>> regressions. Let me know if I should revert while looking at this.
>> >> >>
>> >> >> On Thu, Mar 14, 2019 at 6:46 PM David Jones <dlj at google.com> wrote:
>> >> >>>
>> >> >>> Hey Sanjay,
>> >> >>>
>> >> >>> It looks like this revision caused a pretty substantial
>> performance regression on lcalsALambda in the test-suite. Specifically, all
>> the BM_PRESSURE_CALC_LAMBDA variants (171, 5001, and 44217) showed 5-7%
>> degradation when targeting Haswell. Chandler said he might have some ideas
>> of the issue.
>> >> >>>
>> >> >>> --dlj
>> >> >>>
>> >> >>> On Fri, Mar 8, 2019 at 1:53 PM Sanjay Patel via llvm-commits <
>> llvm-commits at lists.llvm.org> wrote:
>> >> >>>>
>> >> >>>> Author: spatel
>> >> >>>> Date: Fri Mar  8 13:54:41 2019
>> >> >>>> New Revision: 355741
>> >> >>>>
>> >> >>>> URL: http://llvm.org/viewvc/llvm-project?rev=355741&view=rev
>> >> >>>> Log:
>> >> >>>> [x86] scalarize extract element 0 of FP cmp
>> >> >>>>
>> >> >>>> An extension of D58282 noted in PR39665:
>> >> >>>> https://bugs.llvm.org/show_bug.cgi?id=39665
>> >> >>>>
>> >> >>>> This doesn't answer the request to use movmsk, but that's an
>> >> >>>> independent problem. We need this and probably still need
>> >> >>>> scalarization of FP selects because we can't do that as a
>> >> >>>> target-independent transform (although it seems likely that
>> >> >>>> targets besides x86 should have this transform).
>> >> >>>>
>> >> >>>> Modified:
>> >> >>>>     llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> >> >>>>     llvm/trunk/test/CodeGen/X86/extractelement-fp.ll
>> >> >>>>     llvm/trunk/test/CodeGen/X86/vec_floor.ll
>> >> >>>>
>> >> >>>> Modified: llvm/trunk/lib/Target/X86/X86ISelLowering.cpp
>> >> >>>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Target/X86/X86ISelLowering.cpp?rev=355741&r1=355740&r2=355741&view=diff
>> >> >>>>
>> ==============================================================================
>> >> >>>> --- llvm/trunk/lib/Target/X86/X86ISelLowering.cpp (original)
>> >> >>>> +++ llvm/trunk/lib/Target/X86/X86ISelLowering.cpp Fri Mar  8
>> 13:54:41 2019
>> >> >>>> @@ -34298,6 +34298,22 @@ static SDValue scalarizeExtEltFP(SDNode
>> >> >>>>    if (!Vec.hasOneUse() || !isNullConstant(Index) ||
>> VecVT.getScalarType() != VT)
>> >> >>>>      return SDValue();
>> >> >>>>
>> >> >>>> +  // Vector FP compares don't fit the pattern of FP math ops
>> (propagate, not
>> >> >>>> +  // extract, the condition code), so deal with those as a
>> special-case.
>> >> >>>> +  if (Vec.getOpcode() == ISD::SETCC) {
>> >> >>>> +    EVT OpVT = Vec.getOperand(0).getValueType().getScalarType();
>> >> >>>> +    if (OpVT != MVT::f32 && OpVT != MVT::f64)
>> >> >>>> +      return SDValue();
>> >> >>>> +
>> >> >>>> +    // extract (setcc X, Y, CC), 0 --> setcc (extract X, 0),
>> (extract Y, 0), CC
>> >> >>>> +    SDLoc DL(ExtElt);
>> >> >>>> +    SDValue Ext0 = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, OpVT,
>> >> >>>> +                               Vec.getOperand(0), Index);
>> >> >>>> +    SDValue Ext1 = DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, OpVT,
>> >> >>>> +                               Vec.getOperand(1), Index);
>> >> >>>> +    return DAG.getNode(Vec.getOpcode(), DL, VT, Ext0, Ext1,
>> Vec.getOperand(2));
>> >> >>>> +  }
>> >> >>>> +
>> >> >>>>    if (VT != MVT::f32 && VT != MVT::f64)
>> >> >>>>      return SDValue();
>> >> >>>>
>> >> >>>>
>> >> >>>> Modified: llvm/trunk/test/CodeGen/X86/extractelement-fp.ll
>> >> >>>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/extractelement-fp.ll?rev=355741&r1=355740&r2=355741&view=diff
>> >> >>>>
>> ==============================================================================
>> >> >>>> --- llvm/trunk/test/CodeGen/X86/extractelement-fp.ll (original)
>> >> >>>> +++ llvm/trunk/test/CodeGen/X86/extractelement-fp.ll Fri Mar  8
>> 13:54:41 2019
>> >> >>>> @@ -132,9 +132,8 @@ define double @frem_v4f64(<4 x double> %
>> >> >>>>  define i1 @fcmp_v4f32(<4 x float> %x, <4 x float> %y) nounwind {
>> >> >>>>  ; CHECK-LABEL: fcmp_v4f32:
>> >> >>>>  ; CHECK:       # %bb.0:
>> >> >>>> -; CHECK-NEXT:    vcmpltps %xmm0, %xmm1, %xmm0
>> >> >>>> -; CHECK-NEXT:    vpextrb $0, %xmm0, %eax
>> >> >>>> -; CHECK-NEXT:    # kill: def $al killed $al killed $eax
>> >> >>>> +; CHECK-NEXT:    vucomiss %xmm1, %xmm0
>> >> >>>> +; CHECK-NEXT:    seta %al
>> >> >>>>  ; CHECK-NEXT:    retq
>> >> >>>>    %v = fcmp ogt <4 x float> %x, %y
>> >> >>>>    %r = extractelement <4 x i1> %v, i32 0
>> >> >>>> @@ -144,9 +143,8 @@ define i1 @fcmp_v4f32(<4 x float> %x, <4
>> >> >>>>  define i1 @fcmp_v4f64(<4 x double> %x, <4 x double> %y) nounwind
>> {
>> >> >>>>  ; CHECK-LABEL: fcmp_v4f64:
>> >> >>>>  ; CHECK:       # %bb.0:
>> >> >>>> -; CHECK-NEXT:    vcmpnlepd %ymm1, %ymm0, %ymm0
>> >> >>>> -; CHECK-NEXT:    vpextrb $0, %xmm0, %eax
>> >> >>>> -; CHECK-NEXT:    # kill: def $al killed $al killed $eax
>> >> >>>> +; CHECK-NEXT:    vucomisd %xmm0, %xmm1
>> >> >>>> +; CHECK-NEXT:    setb %al
>> >> >>>>  ; CHECK-NEXT:    vzeroupper
>> >> >>>>  ; CHECK-NEXT:    retq
>> >> >>>>    %v = fcmp ugt <4 x double> %x, %y
>> >> >>>>
>> >> >>>> Modified: llvm/trunk/test/CodeGen/X86/vec_floor.ll
>> >> >>>> URL:
>> http://llvm.org/viewvc/llvm-project/llvm/trunk/test/CodeGen/X86/vec_floor.ll?rev=355741&r1=355740&r2=355741&view=diff
>> >> >>>>
>> ==============================================================================
>> >> >>>> --- llvm/trunk/test/CodeGen/X86/vec_floor.ll (original)
>> >> >>>> +++ llvm/trunk/test/CodeGen/X86/vec_floor.ll Fri Mar  8 13:54:41
>> 2019
>> >> >>>> @@ -1665,47 +1665,28 @@ define <2 x double> @floor_maskz_sd_trun
>> >> >>>>  define <4 x float> @floor_mask_ss_mask8(<4 x float> %x, <4 x
>> float> %y, <4 x float> %w) nounwind {
>> >> >>>>  ; SSE41-LABEL: floor_mask_ss_mask8:
>> >> >>>>  ; SSE41:       ## %bb.0:
>> >> >>>> -; SSE41-NEXT:    movaps %xmm0, %xmm3
>> >> >>>> -; SSE41-NEXT:    cmpeqps %xmm1, %xmm3
>> >> >>>> -; SSE41-NEXT:    pextrb $0, %xmm3, %eax
>> >> >>>> -; SSE41-NEXT:    testb $1, %al
>> >> >>>> -; SSE41-NEXT:    je LBB60_2
>> >> >>>> -; SSE41-NEXT:  ## %bb.1:
>> >> >>>> -; SSE41-NEXT:    xorps %xmm2, %xmm2
>> >> >>>> -; SSE41-NEXT:    roundss $9, %xmm0, %xmm2
>> >> >>>> -; SSE41-NEXT:  LBB60_2:
>> >> >>>> -; SSE41-NEXT:    blendps {{.*#+}} xmm1 = xmm2[0],xmm1[1,2,3]
>> >> >>>> -; SSE41-NEXT:    movaps %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    roundss $9, %xmm0, %xmm3
>> >> >>>> +; SSE41-NEXT:    cmpeqss %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    andps %xmm0, %xmm3
>> >> >>>> +; SSE41-NEXT:    andnps %xmm2, %xmm0
>> >> >>>> +; SSE41-NEXT:    orps %xmm3, %xmm0
>> >> >>>> +; SSE41-NEXT:    blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>>  ; SSE41-NEXT:    retq
>> >> >>>>  ;
>> >> >>>>  ; AVX-LABEL: floor_mask_ss_mask8:
>> >> >>>>  ; AVX:       ## %bb.0:
>> >> >>>> -; AVX-NEXT:    vcmpeqps %xmm1, %xmm0, %xmm3
>> >> >>>> -; AVX-NEXT:    vpextrb $0, %xmm3, %eax
>> >> >>>> -; AVX-NEXT:    testb $1, %al
>> >> >>>> -; AVX-NEXT:    je LBB60_2
>> >> >>>> -; AVX-NEXT:  ## %bb.1:
>> >> >>>> -; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm2
>> >> >>>> -; AVX-NEXT:  LBB60_2:
>> >> >>>> -; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm2[0],xmm1[1,2,3]
>> >> >>>> +; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm3
>> >> >>>> +; AVX-NEXT:    vcmpeqss %xmm1, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vblendvps %xmm0, %xmm3, %xmm2, %xmm0
>> >> >>>> +; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>>  ; AVX-NEXT:    retq
>> >> >>>>  ;
>> >> >>>> -; AVX512F-LABEL: floor_mask_ss_mask8:
>> >> >>>> -; AVX512F:       ## %bb.0:
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>> >> >>>> -; AVX512F-NEXT:    vcmpeqps %zmm1, %zmm0, %k1
>> >> >>>> -; AVX512F-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> -; AVX512F-NEXT:    vmovaps %xmm2, %xmm0
>> >> >>>> -; AVX512F-NEXT:    vzeroupper
>> >> >>>> -; AVX512F-NEXT:    retq
>> >> >>>> -;
>> >> >>>> -; AVX512VL-LABEL: floor_mask_ss_mask8:
>> >> >>>> -; AVX512VL:       ## %bb.0:
>> >> >>>> -; AVX512VL-NEXT:    vcmpeqps %xmm1, %xmm0, %k1
>> >> >>>> -; AVX512VL-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> -; AVX512VL-NEXT:    vmovaps %xmm2, %xmm0
>> >> >>>> -; AVX512VL-NEXT:    retq
>> >> >>>> +; AVX512-LABEL: floor_mask_ss_mask8:
>> >> >>>> +; AVX512:       ## %bb.0:
>> >> >>>> +; AVX512-NEXT:    vcmpeqss %xmm1, %xmm0, %k1
>> >> >>>> +; AVX512-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> +; AVX512-NEXT:    vmovaps %xmm2, %xmm0
>> >> >>>> +; AVX512-NEXT:    retq
>> >> >>>>    %mask1 = fcmp oeq <4 x float> %x, %y
>> >> >>>>    %mask = extractelement <4 x i1> %mask1, i64 0
>> >> >>>>    %s = extractelement <4 x float> %x, i64 0
>> >> >>>> @@ -1719,50 +1700,25 @@ define <4 x float> @floor_mask_ss_mask8(
>> >> >>>>  define <4 x float> @floor_maskz_ss_mask8(<4 x float> %x, <4 x
>> float> %y) nounwind {
>> >> >>>>  ; SSE41-LABEL: floor_maskz_ss_mask8:
>> >> >>>>  ; SSE41:       ## %bb.0:
>> >> >>>> -; SSE41-NEXT:    movaps %xmm0, %xmm2
>> >> >>>> -; SSE41-NEXT:    cmpeqps %xmm1, %xmm2
>> >> >>>> -; SSE41-NEXT:    pextrb $0, %xmm2, %eax
>> >> >>>> -; SSE41-NEXT:    testb $1, %al
>> >> >>>> -; SSE41-NEXT:    jne LBB61_1
>> >> >>>> -; SSE41-NEXT:  ## %bb.2:
>> >> >>>> -; SSE41-NEXT:    xorps %xmm0, %xmm0
>> >> >>>> -; SSE41-NEXT:    jmp LBB61_3
>> >> >>>> -; SSE41-NEXT:  LBB61_1:
>> >> >>>> -; SSE41-NEXT:    roundss $9, %xmm0, %xmm0
>> >> >>>> -; SSE41-NEXT:  LBB61_3:
>> >> >>>> -; SSE41-NEXT:    blendps {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
>> >> >>>> -; SSE41-NEXT:    movaps %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    roundss $9, %xmm0, %xmm2
>> >> >>>> +; SSE41-NEXT:    cmpeqss %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    andps %xmm2, %xmm0
>> >> >>>> +; SSE41-NEXT:    blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>>  ; SSE41-NEXT:    retq
>> >> >>>>  ;
>> >> >>>>  ; AVX-LABEL: floor_maskz_ss_mask8:
>> >> >>>>  ; AVX:       ## %bb.0:
>> >> >>>> -; AVX-NEXT:    vcmpeqps %xmm1, %xmm0, %xmm2
>> >> >>>> -; AVX-NEXT:    vpextrb $0, %xmm2, %eax
>> >> >>>> -; AVX-NEXT:    testb $1, %al
>> >> >>>> -; AVX-NEXT:    jne LBB61_1
>> >> >>>> -; AVX-NEXT:  ## %bb.2:
>> >> >>>> -; AVX-NEXT:    vxorps %xmm0, %xmm0, %xmm0
>> >> >>>> -; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>> -; AVX-NEXT:    retq
>> >> >>>> -; AVX-NEXT:  LBB61_1:
>> >> >>>> -; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vroundss $9, %xmm0, %xmm0, %xmm2
>> >> >>>> +; AVX-NEXT:    vcmpeqss %xmm1, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vandps %xmm2, %xmm0, %xmm0
>> >> >>>>  ; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>>  ; AVX-NEXT:    retq
>> >> >>>>  ;
>> >> >>>> -; AVX512F-LABEL: floor_maskz_ss_mask8:
>> >> >>>> -; AVX512F:       ## %bb.0:
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>> >> >>>> -; AVX512F-NEXT:    vcmpeqps %zmm1, %zmm0, %k1
>> >> >>>> -; AVX512F-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> -; AVX512F-NEXT:    vzeroupper
>> >> >>>> -; AVX512F-NEXT:    retq
>> >> >>>> -;
>> >> >>>> -; AVX512VL-LABEL: floor_maskz_ss_mask8:
>> >> >>>> -; AVX512VL:       ## %bb.0:
>> >> >>>> -; AVX512VL-NEXT:    vcmpeqps %xmm1, %xmm0, %k1
>> >> >>>> -; AVX512VL-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> -; AVX512VL-NEXT:    retq
>> >> >>>> +; AVX512-LABEL: floor_maskz_ss_mask8:
>> >> >>>> +; AVX512:       ## %bb.0:
>> >> >>>> +; AVX512-NEXT:    vcmpeqss %xmm1, %xmm0, %k1
>> >> >>>> +; AVX512-NEXT:    vrndscaless $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> +; AVX512-NEXT:    retq
>> >> >>>>    %mask1 = fcmp oeq <4 x float> %x, %y
>> >> >>>>    %mask = extractelement <4 x i1> %mask1, i64 0
>> >> >>>>    %s = extractelement <4 x float> %x, i64 0
>> >> >>>> @@ -1775,47 +1731,28 @@ define <4 x float> @floor_maskz_ss_mask8
>> >> >>>>  define <2 x double> @floor_mask_sd_mask8(<2 x double> %x, <2 x
>> double> %y, <2 x double> %w) nounwind {
>> >> >>>>  ; SSE41-LABEL: floor_mask_sd_mask8:
>> >> >>>>  ; SSE41:       ## %bb.0:
>> >> >>>> -; SSE41-NEXT:    movapd %xmm0, %xmm3
>> >> >>>> -; SSE41-NEXT:    cmpeqpd %xmm1, %xmm3
>> >> >>>> -; SSE41-NEXT:    pextrb $0, %xmm3, %eax
>> >> >>>> -; SSE41-NEXT:    testb $1, %al
>> >> >>>> -; SSE41-NEXT:    je LBB62_2
>> >> >>>> -; SSE41-NEXT:  ## %bb.1:
>> >> >>>> -; SSE41-NEXT:    xorps %xmm2, %xmm2
>> >> >>>> -; SSE41-NEXT:    roundsd $9, %xmm0, %xmm2
>> >> >>>> -; SSE41-NEXT:  LBB62_2:
>> >> >>>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
>> >> >>>> -; SSE41-NEXT:    movapd %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    roundsd $9, %xmm0, %xmm3
>> >> >>>> +; SSE41-NEXT:    cmpeqsd %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    andpd %xmm0, %xmm3
>> >> >>>> +; SSE41-NEXT:    andnpd %xmm2, %xmm0
>> >> >>>> +; SSE41-NEXT:    orpd %xmm3, %xmm0
>> >> >>>> +; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>>  ; SSE41-NEXT:    retq
>> >> >>>>  ;
>> >> >>>>  ; AVX-LABEL: floor_mask_sd_mask8:
>> >> >>>>  ; AVX:       ## %bb.0:
>> >> >>>> -; AVX-NEXT:    vcmpeqpd %xmm1, %xmm0, %xmm3
>> >> >>>> -; AVX-NEXT:    vpextrb $0, %xmm3, %eax
>> >> >>>> -; AVX-NEXT:    testb $1, %al
>> >> >>>> -; AVX-NEXT:    je LBB62_2
>> >> >>>> -; AVX-NEXT:  ## %bb.1:
>> >> >>>> -; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm2
>> >> >>>> -; AVX-NEXT:  LBB62_2:
>> >> >>>> -; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm2[0],xmm1[1]
>> >> >>>> +; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm3
>> >> >>>> +; AVX-NEXT:    vcmpeqsd %xmm1, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vblendvpd %xmm0, %xmm3, %xmm2, %xmm0
>> >> >>>> +; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>>  ; AVX-NEXT:    retq
>> >> >>>>  ;
>> >> >>>> -; AVX512F-LABEL: floor_mask_sd_mask8:
>> >> >>>> -; AVX512F:       ## %bb.0:
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>> >> >>>> -; AVX512F-NEXT:    vcmpeqpd %zmm1, %zmm0, %k1
>> >> >>>> -; AVX512F-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> -; AVX512F-NEXT:    vmovapd %xmm2, %xmm0
>> >> >>>> -; AVX512F-NEXT:    vzeroupper
>> >> >>>> -; AVX512F-NEXT:    retq
>> >> >>>> -;
>> >> >>>> -; AVX512VL-LABEL: floor_mask_sd_mask8:
>> >> >>>> -; AVX512VL:       ## %bb.0:
>> >> >>>> -; AVX512VL-NEXT:    vcmpeqpd %xmm1, %xmm0, %k1
>> >> >>>> -; AVX512VL-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> -; AVX512VL-NEXT:    vmovapd %xmm2, %xmm0
>> >> >>>> -; AVX512VL-NEXT:    retq
>> >> >>>> +; AVX512-LABEL: floor_mask_sd_mask8:
>> >> >>>> +; AVX512:       ## %bb.0:
>> >> >>>> +; AVX512-NEXT:    vcmpeqsd %xmm1, %xmm0, %k1
>> >> >>>> +; AVX512-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> +; AVX512-NEXT:    vmovapd %xmm2, %xmm0
>> >> >>>> +; AVX512-NEXT:    retq
>> >> >>>>    %mask1 = fcmp oeq <2 x double> %x, %y
>> >> >>>>    %mask = extractelement <2 x i1> %mask1, i64 0
>> >> >>>>    %s = extractelement <2 x double> %x, i64 0
>> >> >>>> @@ -1829,50 +1766,25 @@ define <2 x double> @floor_mask_sd_mask8
>> >> >>>>  define <2 x double> @floor_maskz_sd_mask8(<2 x double> %x, <2 x
>> double> %y) nounwind {
>> >> >>>>  ; SSE41-LABEL: floor_maskz_sd_mask8:
>> >> >>>>  ; SSE41:       ## %bb.0:
>> >> >>>> -; SSE41-NEXT:    movapd %xmm0, %xmm2
>> >> >>>> -; SSE41-NEXT:    cmpeqpd %xmm1, %xmm2
>> >> >>>> -; SSE41-NEXT:    pextrb $0, %xmm2, %eax
>> >> >>>> -; SSE41-NEXT:    testb $1, %al
>> >> >>>> -; SSE41-NEXT:    jne LBB63_1
>> >> >>>> -; SSE41-NEXT:  ## %bb.2:
>> >> >>>> -; SSE41-NEXT:    xorpd %xmm0, %xmm0
>> >> >>>> -; SSE41-NEXT:    jmp LBB63_3
>> >> >>>> -; SSE41-NEXT:  LBB63_1:
>> >> >>>> -; SSE41-NEXT:    roundsd $9, %xmm0, %xmm0
>> >> >>>> -; SSE41-NEXT:  LBB63_3:
>> >> >>>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
>> >> >>>> -; SSE41-NEXT:    movapd %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    roundsd $9, %xmm0, %xmm2
>> >> >>>> +; SSE41-NEXT:    cmpeqsd %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    andpd %xmm2, %xmm0
>> >> >>>> +; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>>  ; SSE41-NEXT:    retq
>> >> >>>>  ;
>> >> >>>>  ; AVX-LABEL: floor_maskz_sd_mask8:
>> >> >>>>  ; AVX:       ## %bb.0:
>> >> >>>> -; AVX-NEXT:    vcmpeqpd %xmm1, %xmm0, %xmm2
>> >> >>>> -; AVX-NEXT:    vpextrb $0, %xmm2, %eax
>> >> >>>> -; AVX-NEXT:    testb $1, %al
>> >> >>>> -; AVX-NEXT:    jne LBB63_1
>> >> >>>> -; AVX-NEXT:  ## %bb.2:
>> >> >>>> -; AVX-NEXT:    vxorpd %xmm0, %xmm0, %xmm0
>> >> >>>> -; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>> -; AVX-NEXT:    retq
>> >> >>>> -; AVX-NEXT:  LBB63_1:
>> >> >>>> -; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vroundsd $9, %xmm0, %xmm0, %xmm2
>> >> >>>> +; AVX-NEXT:    vcmpeqsd %xmm1, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vandpd %xmm2, %xmm0, %xmm0
>> >> >>>>  ; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>>  ; AVX-NEXT:    retq
>> >> >>>>  ;
>> >> >>>> -; AVX512F-LABEL: floor_maskz_sd_mask8:
>> >> >>>> -; AVX512F:       ## %bb.0:
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>> >> >>>> -; AVX512F-NEXT:    vcmpeqpd %zmm1, %zmm0, %k1
>> >> >>>> -; AVX512F-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> -; AVX512F-NEXT:    vzeroupper
>> >> >>>> -; AVX512F-NEXT:    retq
>> >> >>>> -;
>> >> >>>> -; AVX512VL-LABEL: floor_maskz_sd_mask8:
>> >> >>>> -; AVX512VL:       ## %bb.0:
>> >> >>>> -; AVX512VL-NEXT:    vcmpeqpd %xmm1, %xmm0, %k1
>> >> >>>> -; AVX512VL-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> -; AVX512VL-NEXT:    retq
>> >> >>>> +; AVX512-LABEL: floor_maskz_sd_mask8:
>> >> >>>> +; AVX512:       ## %bb.0:
>> >> >>>> +; AVX512-NEXT:    vcmpeqsd %xmm1, %xmm0, %k1
>> >> >>>> +; AVX512-NEXT:    vrndscalesd $1, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> +; AVX512-NEXT:    retq
>> >> >>>>    %mask1 = fcmp oeq <2 x double> %x, %y
>> >> >>>>    %mask = extractelement <2 x i1> %mask1, i64 0
>> >> >>>>    %s = extractelement <2 x double> %x, i64 0
>> >> >>>> @@ -2729,47 +2641,28 @@ define <2 x double> @ceil_maskz_sd_trunc
>> >> >>>>  define <4 x float> @ceil_mask_ss_mask8(<4 x float> %x, <4 x
>> float> %y, <4 x float> %w) nounwind {
>> >> >>>>  ; SSE41-LABEL: ceil_mask_ss_mask8:
>> >> >>>>  ; SSE41:       ## %bb.0:
>> >> >>>> -; SSE41-NEXT:    movaps %xmm0, %xmm3
>> >> >>>> -; SSE41-NEXT:    cmpeqps %xmm1, %xmm3
>> >> >>>> -; SSE41-NEXT:    pextrb $0, %xmm3, %eax
>> >> >>>> -; SSE41-NEXT:    testb $1, %al
>> >> >>>> -; SSE41-NEXT:    je LBB86_2
>> >> >>>> -; SSE41-NEXT:  ## %bb.1:
>> >> >>>> -; SSE41-NEXT:    xorps %xmm2, %xmm2
>> >> >>>> -; SSE41-NEXT:    roundss $10, %xmm0, %xmm2
>> >> >>>> -; SSE41-NEXT:  LBB86_2:
>> >> >>>> -; SSE41-NEXT:    blendps {{.*#+}} xmm1 = xmm2[0],xmm1[1,2,3]
>> >> >>>> -; SSE41-NEXT:    movaps %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    roundss $10, %xmm0, %xmm3
>> >> >>>> +; SSE41-NEXT:    cmpeqss %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    andps %xmm0, %xmm3
>> >> >>>> +; SSE41-NEXT:    andnps %xmm2, %xmm0
>> >> >>>> +; SSE41-NEXT:    orps %xmm3, %xmm0
>> >> >>>> +; SSE41-NEXT:    blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>>  ; SSE41-NEXT:    retq
>> >> >>>>  ;
>> >> >>>>  ; AVX-LABEL: ceil_mask_ss_mask8:
>> >> >>>>  ; AVX:       ## %bb.0:
>> >> >>>> -; AVX-NEXT:    vcmpeqps %xmm1, %xmm0, %xmm3
>> >> >>>> -; AVX-NEXT:    vpextrb $0, %xmm3, %eax
>> >> >>>> -; AVX-NEXT:    testb $1, %al
>> >> >>>> -; AVX-NEXT:    je LBB86_2
>> >> >>>> -; AVX-NEXT:  ## %bb.1:
>> >> >>>> -; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm2
>> >> >>>> -; AVX-NEXT:  LBB86_2:
>> >> >>>> -; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm2[0],xmm1[1,2,3]
>> >> >>>> +; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm3
>> >> >>>> +; AVX-NEXT:    vcmpeqss %xmm1, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vblendvps %xmm0, %xmm3, %xmm2, %xmm0
>> >> >>>> +; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>>  ; AVX-NEXT:    retq
>> >> >>>>  ;
>> >> >>>> -; AVX512F-LABEL: ceil_mask_ss_mask8:
>> >> >>>> -; AVX512F:       ## %bb.0:
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>> >> >>>> -; AVX512F-NEXT:    vcmpeqps %zmm1, %zmm0, %k1
>> >> >>>> -; AVX512F-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> -; AVX512F-NEXT:    vmovaps %xmm2, %xmm0
>> >> >>>> -; AVX512F-NEXT:    vzeroupper
>> >> >>>> -; AVX512F-NEXT:    retq
>> >> >>>> -;
>> >> >>>> -; AVX512VL-LABEL: ceil_mask_ss_mask8:
>> >> >>>> -; AVX512VL:       ## %bb.0:
>> >> >>>> -; AVX512VL-NEXT:    vcmpeqps %xmm1, %xmm0, %k1
>> >> >>>> -; AVX512VL-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> -; AVX512VL-NEXT:    vmovaps %xmm2, %xmm0
>> >> >>>> -; AVX512VL-NEXT:    retq
>> >> >>>> +; AVX512-LABEL: ceil_mask_ss_mask8:
>> >> >>>> +; AVX512:       ## %bb.0:
>> >> >>>> +; AVX512-NEXT:    vcmpeqss %xmm1, %xmm0, %k1
>> >> >>>> +; AVX512-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> +; AVX512-NEXT:    vmovaps %xmm2, %xmm0
>> >> >>>> +; AVX512-NEXT:    retq
>> >> >>>>    %mask1 = fcmp oeq <4 x float> %x, %y
>> >> >>>>    %mask = extractelement <4 x i1> %mask1, i64 0
>> >> >>>>    %s = extractelement <4 x float> %x, i64 0
>> >> >>>> @@ -2783,50 +2676,25 @@ define <4 x float> @ceil_mask_ss_mask8(<
>> >> >>>>  define <4 x float> @ceil_maskz_ss_mask8(<4 x float> %x, <4 x
>> float> %y) nounwind {
>> >> >>>>  ; SSE41-LABEL: ceil_maskz_ss_mask8:
>> >> >>>>  ; SSE41:       ## %bb.0:
>> >> >>>> -; SSE41-NEXT:    movaps %xmm0, %xmm2
>> >> >>>> -; SSE41-NEXT:    cmpeqps %xmm1, %xmm2
>> >> >>>> -; SSE41-NEXT:    pextrb $0, %xmm2, %eax
>> >> >>>> -; SSE41-NEXT:    testb $1, %al
>> >> >>>> -; SSE41-NEXT:    jne LBB87_1
>> >> >>>> -; SSE41-NEXT:  ## %bb.2:
>> >> >>>> -; SSE41-NEXT:    xorps %xmm0, %xmm0
>> >> >>>> -; SSE41-NEXT:    jmp LBB87_3
>> >> >>>> -; SSE41-NEXT:  LBB87_1:
>> >> >>>> -; SSE41-NEXT:    roundss $10, %xmm0, %xmm0
>> >> >>>> -; SSE41-NEXT:  LBB87_3:
>> >> >>>> -; SSE41-NEXT:    blendps {{.*#+}} xmm1 = xmm0[0],xmm1[1,2,3]
>> >> >>>> -; SSE41-NEXT:    movaps %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    roundss $10, %xmm0, %xmm2
>> >> >>>> +; SSE41-NEXT:    cmpeqss %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    andps %xmm2, %xmm0
>> >> >>>> +; SSE41-NEXT:    blendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>>  ; SSE41-NEXT:    retq
>> >> >>>>  ;
>> >> >>>>  ; AVX-LABEL: ceil_maskz_ss_mask8:
>> >> >>>>  ; AVX:       ## %bb.0:
>> >> >>>> -; AVX-NEXT:    vcmpeqps %xmm1, %xmm0, %xmm2
>> >> >>>> -; AVX-NEXT:    vpextrb $0, %xmm2, %eax
>> >> >>>> -; AVX-NEXT:    testb $1, %al
>> >> >>>> -; AVX-NEXT:    jne LBB87_1
>> >> >>>> -; AVX-NEXT:  ## %bb.2:
>> >> >>>> -; AVX-NEXT:    vxorps %xmm0, %xmm0, %xmm0
>> >> >>>> -; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>> -; AVX-NEXT:    retq
>> >> >>>> -; AVX-NEXT:  LBB87_1:
>> >> >>>> -; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vroundss $10, %xmm0, %xmm0, %xmm2
>> >> >>>> +; AVX-NEXT:    vcmpeqss %xmm1, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vandps %xmm2, %xmm0, %xmm0
>> >> >>>>  ; AVX-NEXT:    vblendps {{.*#+}} xmm0 = xmm0[0],xmm1[1,2,3]
>> >> >>>>  ; AVX-NEXT:    retq
>> >> >>>>  ;
>> >> >>>> -; AVX512F-LABEL: ceil_maskz_ss_mask8:
>> >> >>>> -; AVX512F:       ## %bb.0:
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>> >> >>>> -; AVX512F-NEXT:    vcmpeqps %zmm1, %zmm0, %k1
>> >> >>>> -; AVX512F-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> -; AVX512F-NEXT:    vzeroupper
>> >> >>>> -; AVX512F-NEXT:    retq
>> >> >>>> -;
>> >> >>>> -; AVX512VL-LABEL: ceil_maskz_ss_mask8:
>> >> >>>> -; AVX512VL:       ## %bb.0:
>> >> >>>> -; AVX512VL-NEXT:    vcmpeqps %xmm1, %xmm0, %k1
>> >> >>>> -; AVX512VL-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> -; AVX512VL-NEXT:    retq
>> >> >>>> +; AVX512-LABEL: ceil_maskz_ss_mask8:
>> >> >>>> +; AVX512:       ## %bb.0:
>> >> >>>> +; AVX512-NEXT:    vcmpeqss %xmm1, %xmm0, %k1
>> >> >>>> +; AVX512-NEXT:    vrndscaless $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> +; AVX512-NEXT:    retq
>> >> >>>>    %mask1 = fcmp oeq <4 x float> %x, %y
>> >> >>>>    %mask = extractelement <4 x i1> %mask1, i64 0
>> >> >>>>    %s = extractelement <4 x float> %x, i64 0
>> >> >>>> @@ -2839,47 +2707,28 @@ define <4 x float> @ceil_maskz_ss_mask8(
>> >> >>>>  define <2 x double> @ceil_mask_sd_mask8(<2 x double> %x, <2 x
>> double> %y, <2 x double> %w) nounwind {
>> >> >>>>  ; SSE41-LABEL: ceil_mask_sd_mask8:
>> >> >>>>  ; SSE41:       ## %bb.0:
>> >> >>>> -; SSE41-NEXT:    movapd %xmm0, %xmm3
>> >> >>>> -; SSE41-NEXT:    cmpeqpd %xmm1, %xmm3
>> >> >>>> -; SSE41-NEXT:    pextrb $0, %xmm3, %eax
>> >> >>>> -; SSE41-NEXT:    testb $1, %al
>> >> >>>> -; SSE41-NEXT:    je LBB88_2
>> >> >>>> -; SSE41-NEXT:  ## %bb.1:
>> >> >>>> -; SSE41-NEXT:    xorps %xmm2, %xmm2
>> >> >>>> -; SSE41-NEXT:    roundsd $10, %xmm0, %xmm2
>> >> >>>> -; SSE41-NEXT:  LBB88_2:
>> >> >>>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm1 = xmm2[0],xmm1[1]
>> >> >>>> -; SSE41-NEXT:    movapd %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    roundsd $10, %xmm0, %xmm3
>> >> >>>> +; SSE41-NEXT:    cmpeqsd %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    andpd %xmm0, %xmm3
>> >> >>>> +; SSE41-NEXT:    andnpd %xmm2, %xmm0
>> >> >>>> +; SSE41-NEXT:    orpd %xmm3, %xmm0
>> >> >>>> +; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>>  ; SSE41-NEXT:    retq
>> >> >>>>  ;
>> >> >>>>  ; AVX-LABEL: ceil_mask_sd_mask8:
>> >> >>>>  ; AVX:       ## %bb.0:
>> >> >>>> -; AVX-NEXT:    vcmpeqpd %xmm1, %xmm0, %xmm3
>> >> >>>> -; AVX-NEXT:    vpextrb $0, %xmm3, %eax
>> >> >>>> -; AVX-NEXT:    testb $1, %al
>> >> >>>> -; AVX-NEXT:    je LBB88_2
>> >> >>>> -; AVX-NEXT:  ## %bb.1:
>> >> >>>> -; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm2
>> >> >>>> -; AVX-NEXT:  LBB88_2:
>> >> >>>> -; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm2[0],xmm1[1]
>> >> >>>> +; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm3
>> >> >>>> +; AVX-NEXT:    vcmpeqsd %xmm1, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vblendvpd %xmm0, %xmm3, %xmm2, %xmm0
>> >> >>>> +; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>>  ; AVX-NEXT:    retq
>> >> >>>>  ;
>> >> >>>> -; AVX512F-LABEL: ceil_mask_sd_mask8:
>> >> >>>> -; AVX512F:       ## %bb.0:
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>> >> >>>> -; AVX512F-NEXT:    vcmpeqpd %zmm1, %zmm0, %k1
>> >> >>>> -; AVX512F-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> -; AVX512F-NEXT:    vmovapd %xmm2, %xmm0
>> >> >>>> -; AVX512F-NEXT:    vzeroupper
>> >> >>>> -; AVX512F-NEXT:    retq
>> >> >>>> -;
>> >> >>>> -; AVX512VL-LABEL: ceil_mask_sd_mask8:
>> >> >>>> -; AVX512VL:       ## %bb.0:
>> >> >>>> -; AVX512VL-NEXT:    vcmpeqpd %xmm1, %xmm0, %k1
>> >> >>>> -; AVX512VL-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> -; AVX512VL-NEXT:    vmovapd %xmm2, %xmm0
>> >> >>>> -; AVX512VL-NEXT:    retq
>> >> >>>> +; AVX512-LABEL: ceil_mask_sd_mask8:
>> >> >>>> +; AVX512:       ## %bb.0:
>> >> >>>> +; AVX512-NEXT:    vcmpeqsd %xmm1, %xmm0, %k1
>> >> >>>> +; AVX512-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm2 {%k1}
>> >> >>>> +; AVX512-NEXT:    vmovapd %xmm2, %xmm0
>> >> >>>> +; AVX512-NEXT:    retq
>> >> >>>>    %mask1 = fcmp oeq <2 x double> %x, %y
>> >> >>>>    %mask = extractelement <2 x i1> %mask1, i64 0
>> >> >>>>    %s = extractelement <2 x double> %x, i64 0
>> >> >>>> @@ -2893,50 +2742,25 @@ define <2 x double> @ceil_mask_sd_mask8(
>> >> >>>>  define <2 x double> @ceil_maskz_sd_mask8(<2 x double> %x, <2 x
>> double> %y) nounwind {
>> >> >>>>  ; SSE41-LABEL: ceil_maskz_sd_mask8:
>> >> >>>>  ; SSE41:       ## %bb.0:
>> >> >>>> -; SSE41-NEXT:    movapd %xmm0, %xmm2
>> >> >>>> -; SSE41-NEXT:    cmpeqpd %xmm1, %xmm2
>> >> >>>> -; SSE41-NEXT:    pextrb $0, %xmm2, %eax
>> >> >>>> -; SSE41-NEXT:    testb $1, %al
>> >> >>>> -; SSE41-NEXT:    jne LBB89_1
>> >> >>>> -; SSE41-NEXT:  ## %bb.2:
>> >> >>>> -; SSE41-NEXT:    xorpd %xmm0, %xmm0
>> >> >>>> -; SSE41-NEXT:    jmp LBB89_3
>> >> >>>> -; SSE41-NEXT:  LBB89_1:
>> >> >>>> -; SSE41-NEXT:    roundsd $10, %xmm0, %xmm0
>> >> >>>> -; SSE41-NEXT:  LBB89_3:
>> >> >>>> -; SSE41-NEXT:    blendpd {{.*#+}} xmm1 = xmm0[0],xmm1[1]
>> >> >>>> -; SSE41-NEXT:    movapd %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    roundsd $10, %xmm0, %xmm2
>> >> >>>> +; SSE41-NEXT:    cmpeqsd %xmm1, %xmm0
>> >> >>>> +; SSE41-NEXT:    andpd %xmm2, %xmm0
>> >> >>>> +; SSE41-NEXT:    blendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>>  ; SSE41-NEXT:    retq
>> >> >>>>  ;
>> >> >>>>  ; AVX-LABEL: ceil_maskz_sd_mask8:
>> >> >>>>  ; AVX:       ## %bb.0:
>> >> >>>> -; AVX-NEXT:    vcmpeqpd %xmm1, %xmm0, %xmm2
>> >> >>>> -; AVX-NEXT:    vpextrb $0, %xmm2, %eax
>> >> >>>> -; AVX-NEXT:    testb $1, %al
>> >> >>>> -; AVX-NEXT:    jne LBB89_1
>> >> >>>> -; AVX-NEXT:  ## %bb.2:
>> >> >>>> -; AVX-NEXT:    vxorpd %xmm0, %xmm0, %xmm0
>> >> >>>> -; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>> -; AVX-NEXT:    retq
>> >> >>>> -; AVX-NEXT:  LBB89_1:
>> >> >>>> -; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vroundsd $10, %xmm0, %xmm0, %xmm2
>> >> >>>> +; AVX-NEXT:    vcmpeqsd %xmm1, %xmm0, %xmm0
>> >> >>>> +; AVX-NEXT:    vandpd %xmm2, %xmm0, %xmm0
>> >> >>>>  ; AVX-NEXT:    vblendpd {{.*#+}} xmm0 = xmm0[0],xmm1[1]
>> >> >>>>  ; AVX-NEXT:    retq
>> >> >>>>  ;
>> >> >>>> -; AVX512F-LABEL: ceil_maskz_sd_mask8:
>> >> >>>> -; AVX512F:       ## %bb.0:
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm1 killed $xmm1 def $zmm1
>> >> >>>> -; AVX512F-NEXT:    ## kill: def $xmm0 killed $xmm0 def $zmm0
>> >> >>>> -; AVX512F-NEXT:    vcmpeqpd %zmm1, %zmm0, %k1
>> >> >>>> -; AVX512F-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> -; AVX512F-NEXT:    vzeroupper
>> >> >>>> -; AVX512F-NEXT:    retq
>> >> >>>> -;
>> >> >>>> -; AVX512VL-LABEL: ceil_maskz_sd_mask8:
>> >> >>>> -; AVX512VL:       ## %bb.0:
>> >> >>>> -; AVX512VL-NEXT:    vcmpeqpd %xmm1, %xmm0, %k1
>> >> >>>> -; AVX512VL-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> -; AVX512VL-NEXT:    retq
>> >> >>>> +; AVX512-LABEL: ceil_maskz_sd_mask8:
>> >> >>>> +; AVX512:       ## %bb.0:
>> >> >>>> +; AVX512-NEXT:    vcmpeqsd %xmm1, %xmm0, %k1
>> >> >>>> +; AVX512-NEXT:    vrndscalesd $2, %xmm0, %xmm1, %xmm0 {%k1} {z}
>> >> >>>> +; AVX512-NEXT:    retq
>> >> >>>>    %mask1 = fcmp oeq <2 x double> %x, %y
>> >> >>>>    %mask = extractelement <2 x i1> %mask1, i64 0
>> >> >>>>    %s = extractelement <2 x double> %x, i64 0
>> >> >>>>
>> >> >>>>
>> >> >>>> _______________________________________________
>> >> >>>> llvm-commits mailing list
>> >> >>>> llvm-commits at lists.llvm.org
>> >> >>>> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> >> >
>> >> > _______________________________________________
>> >> > llvm-commits mailing list
>> >> > llvm-commits at lists.llvm.org
>> >> > https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>> >> _______________________________________________
>> >> llvm-commits mailing list
>> >> llvm-commits at lists.llvm.org
>> >> https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-commits
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.llvm.org/pipermail/llvm-commits/attachments/20190319/2b97648a/attachment-0001.html>


More information about the llvm-commits mailing list